Python Tutorial - Python for Beginners [Full Course]

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hello everybody and welcome to chapter one of python for everybody i'm charles severance i'm your instructor and uh i welcome you to this class the basic goal of this class is to teach everybody how to program regardless of your background you don't have to be a math whiz you don't have to be a computer expert no matter how old you are or what your background is we want to teach you how to program so welcome to the course welcome to chapter one so the first thing to understand is that um the purpose to learn to program is because computers want to do things for us they are built and created and designed and their hardware is set up so that they basically ask us what do you want to do next if you grab your phone your phone sort of does nothing until you tell it what to do it waits for you and it's just waiting for you and all the hardware computer technology around you is generally waiting for you and we can use this for useful things we could play video games we could uh have it help navigate for our cars someday we might even have self-driving cars and it's really in a sense in my mind silly if you spend your whole life not really understanding this technology and and i think it's important that we learn to tell these computers what to do rather than just let them uh increasingly control our lives and so as we'll see computers aren't very smart on their own we humans are the ones that imbue them with knowledge and but we need to learn to speak their language it is much easier for us to learn to speak their language than it is for them to learn to speak our language although with these cell phones we're starting to see little bits where they can begin to understand but you would be amazed at the 40 or 50 years that it takes has taken us to understand how to to build programs to to begin to understand so i'm bringing you into something where you are going to learn the ways of programming and the ways of the computer because it's easier to teach you how to program than it is to teach this how to work in your world even though ultimately the goal is to get this to to do work for you so part of what i'm trying to do is move you from a user perspective where you just look at the computer as something that someone else has constructed and you are the user of to the point where you construct new things now the first kinds of things that you're going to construct are actually things to solve your own problems and it's a very popular now to work on data and python is an excellent programming language for data data mining and data analysis and that's a lot of what we're going to do in this course although really it's a gateway to all kinds of things like you know artificial intelligence or gaming or navigation or mobile applications or entertainment all kinds of things but first you have to learn to program we have to move from using the computer as a tool to using the tools within the computer that allow us to change how the computer sees the world so there's a couple of reasons that you might want to be a programmer some of you are looking to improve your career to be paid to work on programming i've been a paid programmer most of my life and i like it it's a good job you don't have to stand in the mud you don't have to lift things you have to use your brain and um i'll just say that it is been nice for my career to not be exposed to the elements but to be able to work often wherever i want but that's actually our secondary goal our first goal is to get you to write programs that solve problems that you have to solve maybe you have a job as an accountant or a lawyer or something else and maybe you run across some data maybe there's some system that logs your time and it's not quite giving the report that you want to give and so you say could i just grab the log data myself and and write a program to do some analysis to say oh what's the average this versus that or the average of some other thing right and so that's the basic idea that you'll you'll initially use computers to serve your own ends that makes it a lot easier to write programs because you don't have to worry about you know a million users using your software if it works for you then we're happy and so it takes a little more training to write software for other people or for thousands and thousands of other people and so part of what i want to do is i want to change your perspective you know you look at this from the outside and you see it from the outside and you click on things i want to turn this around and i want you to be the person inside this looking out at the world and as a programmer we are making things inside these computers for the world and so we want to pull you into being part of this we want you inside this or thinking inside this and what you learn is that if you're inside this computer and you are taking your instructions to build programs to be used by the human oops almost dropped that the human outside the computer you have things that you need to take advantage of there's things like the central processing unit the memory of this system the network connection of this system the the disk drive or permanent storage on this system and as a programmer you are kind of mediating between all those internal resources that this has that are not very smart but highly powerful and mediating with what that user wants right and so we take the end user and we programmers we serve the end user but the computer serves us so together between us and all the computers resources we can serve the needs of the end user and we do this by writing code or programming okay and what is that well programming is a sequence of instructions where we are giving instructions to the resources inside the computer in a way to accomplish the goals of the end user and remember sometimes we are our own end user it's not just it's not just you know the uh you're not always doing a startup you're not always writing a mobile gaming system uh sometimes you're writing something for yourself but that's okay so sometimes you're writing something to solve a problem you're like crafting you're you're doing something that you could do by hand or manually and you're you're making some clever little 25 or 100 line program and uh you're putting that in other times like when i work on the open source learning management system sakai it is my creativity i've got an idea and i want to share it with a million users and so i write my code to for an external audience and so code is that sequence of instructions that the computer itself doesn't know how to hand a roster out but i can write code that will hand a roster out by looking at the data that's inside this computer inside this application and so if you think about programs we have programs for computers and programs for humans and a number of years ago now i'm sooner or later this will be me showing my age this is an example of the macarena and the macarena is a song that effectively is a sequence of instructions you put your left hand out you put your right hand out you put it on the shoulder you wiggle wiggle wiggle and you spin around and you do things and this um this is a program for people uh and so i want you to take a quick look at this and see if you can find anything wrong with this particular program so look really closely [Music] so i'll show you it's got some typographical errors in it and we as humans are really good at reading or hearing typographical errors and correcting them automatically and instantly and um but computers are not computers are extremely literal if it saw this ham instead of hand it would think what's a ham and why am i going to hit someone in the back of the head with a ham and why would i take my left hand and hit somebody that's you know these are all bad things but the computer is going to take us very literally and so we have to be really precise and and the computer just doesn't know the difference between what we mean and what we say so we have to be very precise and this is one of the great frustrations that people have when they first start using computers and so we have to get this right we have to get these little bits of text exactly the way they are computers will blow up with syntax errors and they seem to to make quite a fuss when you make the tiniest of errors but you'll get used to that i mean that's because not because you're bad or you're less than awesome it just means the computers can't compensate when you make small mistakes and so you've got to get used to the fact that the computer is sort of intellectually not as strong as you and so it gets confused really easy even though when it gets confused it says seemingly mean things to you so you'll you'll get used to that okay so the first thing i want to do is i want to throw up some text and i want you to while this text is up i want you to count the number of each word in this text and tell me what the most common word is in this text okay so here we go [Music] do okay so i i kind of made that hard on you on purpose by moving around and distracting you and confusing you but even if it's not moving at all it's a little bit you know tricky to do you probably stare at it a couple of times your brain is going back and forth and back and forth and so let's text analysis is one of the great things that computers are very very good at um and some of the things that you know they can translate text and that's because they've looked at a lot of information so looking at text is actually something computers are really good at and so if we take a look at the kind of programs that we're going to write to do this kind of thing this is something that humans are not naturally good at but computers are super good at now i'm not going to have you look at this code i'm not going to this code you will understand in a few weeks but basically this is a set of instructions to open a file read that file read all the words in the file create a histogram of all the words in the file and then search through that histogram to find the most common word and tell us what the most common word is in the file and in this clown file the word the is the most common it happens seven times and here's another large file called words.txt and the word to is the most common thing and our goal is to get to the point where you can write this on your own so you can say you know what i got a problem to solve that is what's the most common word in this file i know how to start and then i know how to finish i know how to do the stuff in the middle and we have to learn this kind of weird language but when we do we can count millions of words as easily to count 20 words so that's the fun of all of this is to teach you this language so that you can solve that problem so that you don't have to solve it because you could solve it but it's not something that you're naturally good at and it's hard work so up next we're going to talk a little bit about the hardware architecture that you can you're going to be experiencing as you write programs [Music] hello and welcome back to hardware architecture now you might ask you know why do i tell you about hardware architecture um you're not pro probably you're not going to build any hardware although it's fun stuff to do and if you're going to become a computer scientist which most of you won't want to be it's a great thing to study and it's a those who build our hardware are amazingly talented individuals and it's a really rewarding job the reason i like talking to you about hardware is because i want to be able to use words at some point and say oh secondary storage or central processing unit or or random access memory or peripherals you know input devices and i want to be able to say those words and i want you to be able to understand them and so i got i'll start with a little piece of hardware called the raspberry pi and the raspberry pi is a cute little single board computer uh we as we go forward these things are smaller and smaller and smaller and the interesting thing is is that the architecture of these stays the same but the number of components drops so i'm going to start and give you a block diagram of sort of a generic computer and tell you the major parts of it now i'm going to show you some really old hardware some really new hardware and then some hardware that is of medium age and the medium age hardware is probably the easiest one to see the architecture is the same okay and so the basic block diagram is that the brains if there are brains in computers which there really aren't the software is the closest thing computers after brains but in hardware the closest brain the computer has is this called a microprocessing unit or a central processor unit and this is designed you know three billion times a sec three billion times a second to ask the question what do you want me to do next and these little pins on the back are our instructions like 32 or 64 of these pins 3 billion times a second we send an instruction into these things now we can't sit there and talk to it we can't and so the instructions we store in what's called the main memory and this memory is really fast and the memory sort of feeds this and so every time the cpu needs a new instruction it asks the memory where that instruction is and so this the memory feeds the instruction cpu the cpu does it says give me another instruction cpu does it gives you another instruction and that is the basic uh essence of programming this asks what's next and this is where your program is stored or a program you purchased or came with your hardware where that's all stored and those are your places and so you end up inside your programs end up inside this memory so then there's a i mean and so in software you tend to program the the cpu and if you had bought a desktop computer a number of years back it would have this thing called the motherboard and the motherboard is called this because it kind of connects all the components together and so if you buy memory by itself it does nothing but it has a place to plug into the motherboard and if you buy a microprocessor it has a place to plug into the motherboard and if you buy a hard drive this is a really old hard drive it has a place to plug in on the motherboard and so the motherboard sort of connects everything together the hard drive is secondary storage now the way sec what the house secondary storage is different than uh the main memory which there it is i got unpile this stuff so this main memory is really fast but as soon as you uh turn the memory the power off of this memory it sort of vanishes and so to store files like word processing files or text files or whatever you got to start on something that lasts a little bit longer and so that's the purpose of the secondary storage it's permanent when the power is off it stores it now this one here is in such bad shape that isn't probably storing anything but it's got these little heads and it spins around and goes in and out and we'll have a video later that shows you one of these things that's not quite in as bad a shape if you look this has four different platters that are all spinning around and so this is just using magnetic material and electronics that sort of magnetize and demagnetize this stuff and if you look at a disc they're they're often rated physical disks are rated in revolutions per minute and that's how many times this thing spins around if you've got an old desktop and you hear it spin up this is the thing that's spinning and it's the place that your operating system lives your files live your applications live while they're stored and while the computer is turned off and then they're loaded into this while they're running and then this cpu takes the data from the main memory and your program runs at 3 billion operations per second so let's talk a little bit about something that this is probably from the 1960s or 70s this actually has if you're an electric electrical person it has capacitors those little little silver things are capacitors these little colored things are resistors and that's more capacitors and then there's wires and wires move everything and so when you say like this has millions of transistors oh wait that isn't a capacitor that's a transistor that's a transistor when you say that this here has etched and if you look closely at this go look at a picture of a microprocessor online you will see that it has millions of these and so the difference between 1960 and today is this circuitry of capacitor capacitors resistors and transistors has been microwized and put onto this it's using a photographic process and they're tinier and tinier and putting more and more on and if you think going from millions of these to one of these is crazy the thing that's happening now and the reason we have whole computers inside our pocket is that everything all of this this whole thing cpu memory everything all of it connected and the storage is being made smaller and smaller and so this little single board computer called the raspberry pi has one thing in it and it has the main memory and it has the cpu it has connections for things like peripherals like keyboards and stuff now it doesn't yet have secondary storage on it the secondary storage gets plugged in right here via usb and then if you take it one step farther to my phone it's got the secondary storage built right in and so you know the these this picture goes from the size of cabinets in the old days all the way down to really tiny but at the end of the day inside it is a highly sophisticated piece of circuitry that asks for instructions one at a time and main memory that holds the instructions and feeds them okay central processor does the thinking let's take a look here central processor does the thinking it runs the program it's what asking what's next it's not really it's not really smart but it's really fast and so we compensate for the lack of intelligence of this thing by us writing really good software that runs really fast and so voice recognition on things like phones is possible because computers have so much storage and they run so fast and the algorithms that do voice recognition are finally starting to work input devices like keyboards and mice and pens and whatever they come in output devices are like the screens that we see the main memory is the is the fast part of the computer that stores all the programs and the secondary memory is the permanent storage increasingly secondary memory do i have any usb sticks in here i don't well increasingly secondary memory is uh flash ram uh or or static uh static ram uh with no moving parts and so so you in a few years you'll not even be able to see uh secondary memory with uh with moving parts but that's okay it's still secondary memory it's still memory that lasts and so you and where your place is in here is you live in the main memory this is you you are here and so in a sense when the cpu asks the question what next it is your job to answer that and you answer that by writing python code and so your python code you'll write a file in python code blah blah blah blah blah blah blah and then that python code sort of gets loaded into main memory there's a magic translation process that happens and then your code is actually answering this question three billion times a second three billion times a second you're sitting there but this is you you're really out here but you then write a file and the file's loaded in and then the file runs and that's how things are at and that's that's your place in the world now what's actually running is not python code there is as i said a translation process you write a python file and then python itself translates this into the actual language known by the microprocessor which is a series of zeros and ones called machine language someday i would love to teach you a class on machine language but for now we're going to teach you python and we're going to use python as a crutch we don't have to talk machine language but you could if you really wanted to you could know how to write machine language but i assure you python is far easier to learn than machine language so python acts as a translator translates what you're doing in the machine language and then the machine language is what's sent back and forth but still even though it's translated to machine language it's you it is you answering those questions and that's what a program is as you pre-storing your response to the what next question over and over again so here's a couple of videos that you can look at on youtube about a cpu these cpus and looks very much like this cpu that i've got with me these cpus run have extremely high heat when when you put this thing on your computer on your lap and it starts to heat up that means it's thinking really really hard and so this is a small little old video from a long time ago that shows what happens when you take out the cooling capability of microprocessors and how just how hot they can be and the other video that i have is a hard disk something like this hard disk that i have except that it works and they turn the power on uh some of them last for a few seconds some of them last for a few minutes it's never i must be allergic to this hard drive or maybe maybe because there's dust in this hard drive and i keep spinning it and i sneeze but um but but basically some of them last for a few seconds some of them last for a few minutes it's not a good idea to open them up but i'm glad somebody opened it up and then did what they did and then recorded it so we can all enjoy uh what it is that they're capable of doing okay so that's a quick introduction to hardware mostly so that i can use those words going forward now what we're going to talk about next is communicating in the language python that is writing code and putting it into the computer so that that can execute [Music] okay [Music] hello and welcome back to python as a language you'll notice that i'm wearing a hat and uh part of the story of the hat is that uh where i work here at the university of michigan school of information uh we my office is in this building called north quad and uh the we call it quad warts sometimes because it's sort of got a square it sort of imitates an oxford quad and so uh it seemed to me to evoke notions of harry potter when we first moved into the building i joked in one of my classes that we should have a sorting ceremony for all the students as they come into north quad for the first time and uh and so that was cool and i thought that i would belong in uh gryffindor like everyone wants to be in gryffindor right they're the good guys and my students told me that i couldn't be in gryffindor um that i had to be in slytherin so you'll see me drinking tea throughout the course out of this teacup it's my slytherin teacup i got i picked that up from harry potter world i went down to florida and visited harry potter world and um the reason that i am was sorted by my students into slytherin is also because i teach python and python is like a snake and so if you think about the people from slytherin they are capable of talking to snakes and the class that we were doing the sorting was a python class and so it sort of made perfect sense that you would have to be in slytherin if you were the python teacher and of course your name is charles severance and then that sounds kind of like severus snape and so i just accepted that i'm in slytherin okay so you all can be in gryffindor but i can't i'm in slytherin so i'm the bad guy or the good guy depends on how you look at it right and so what i'm going to do now is i'm going to you know bring you into uh slytherin as well because i'm going to teach you the python language python is the language that we pythonistas talk uh it was invented about over 20 years ago by a fellow named guido van rossum and uh away we go now even though i'm using this whole snake slytherin thing it turns out that python was not at all named for harry potter because python was invented you know almost two decades before harry potter was created and it wasn't for the snake it was actually monty python's flying circus was the inspiration uh for python uh the name python and uh because guido van rossum really wanted to create a programming language that while it was powerful underneath it in its very nature was a very powerful language he wanted it to be a language that was fun and he wanted to be a language that was approachable and so that's why python recently has become so absolutely popular and uh it's easy to learn and it's but it's also powerful and that's sort of the magic of python is the ease of learning it the the the brevity of the programs the shortness of the programs and the uh and the power and so we are going to become pythonistas now as you learn to be a software developer using the python programming language um you are going to encounter syntax errors and i remember when i used to get syntax errors and i remember my first programming class and i would type on cards and i would upload those cards to the computer and the computer would say you're not worthy and i'm like wait a sec those are pretty good cards how could you be so critical of me um you know i'd say syntax there and i i really got sort of a a really bad attitude that somehow this computer didn't like me and that i would make cards that would complain and i would make changes to the cards and it would still complain and i'd make changes and would still complain i'm like how can i win in this situation and you're gonna feel the same thing you're gonna absolutely feel the same thing you're gonna be struggling you're gonna be like how come this computer hates me let me assure you right now the computer doesn't hate you the computer actually loves you it just is not very good at showing how it loves you or telling you how or why it loves you and so syntax errors are not so much python telling you that you're bad or that you're an inadequate programmer or you should find something else to do it's really python's admission that it doesn't understand what you're trying to say and so you got to get used to that and it's frustrating but you got to get used to the fact that syntax errors are your friend python is saying hey i got to line seven and i was doing fine up to line seven but boy in line seven there's some little thing i don't know what the word else means in this context or you didn't indent it and so i'm kind of confused what did you mean please please please help me you know and so it's so much easier for you to learn python than it is for python to figure out what you mean when you're writing code so we have a number of different ways to sort of encode our instructions when we talk to python one is we just run python interactively on our computer hopefully by now you've got it installed and you just type python at a command prompt so either a windows command prompt or a linux command prompt or a macintosh command prompt and i got some examples of how to sort of get this all started get python installed and away you go now you'll notice when you run the python interpreter the three chevron prompt python is asking you what next right this is you it's saying i want to talk to you i want you to tell me some python to do if you know the python language you know what to say right here now if you know python you can type these languages you can say oh x equals one which really means go find a little piece of memory label it x and stick one in it print x is like go find that thing where you labeled it x and bring me back that number and tell me what i stored in there now why you want to do this that's a different question and these are very simple things it's going to take you a while to get the big picture of why we're doing this so just trust me that you want to learn these statements and then later we will successfully turn those into a program so x equals x plus 1 the third line there x equals x plus 1 is not as it seems in math it basically says hey go grab the old value of x add 1 to it and stick it back in x that's what that means so equal sign really has kind of an arrow to it and then we say hey go look up that x thing that we just did and print that out and then we're going to say quit so that's us talking to python now you can type just about any crazy stuff you want in here and python will be unhappy and talk to you so uh what we're going to do next is we're going to start talking about the actual language of python and what it is that we have to say to make python happy when we're talking to it [Music] so now we're going to start learning the actual python language so what do we say you can think of this as almost like writing almost like writing a story we're going to start with a basic vocabulary we're going to talk a little bit about lines or sentences and then we're going to start talking about how to put those sentences together to make a coherent paragraph as it were and you just have to accept the fact that when i start teaching you this stuff it's not going to make sense for about six or seven more chapters and so just sort of bear with me except i mean i remember when i first learned i it went from me confused confused confused confused confused holy mackerel this is awesome and so i expect many of you will go through that same thing so just learn the first parts accept the fact that it doesn't necessarily make sense in a big picture um and and just just bear with us okay so we'll start with vocabulary we'll start to make sentences and then we'll have little short stories and paragraphs okay and so this is a short story about how to count the words in python um it's got a couple of paragraphs and we are going to look at all of this stuff eventually so we start with a set of reserved words and what are reserved words well they're words that um python expects when you use these words that they're going to mean exactly what python expects to mean and what it really means is you're not allowed to use them for any other purpose than the purpose that python wants it's for part of the contract it's like when you have a dog and you go what did you think of that television program and the dog has no idea what you're saying and then you say um do you do you want to wait until saturday to to um to go to the veterinarian and the dog still doesn't know what you say and then you go like um how would you like to take a walk and then the dog goes walk i know what that means and then hits the door right and so the way the dog sees you is blah blah blah blah walk blah blah blah blah food blah blah blah blah treat blah blah blah blah walk that's kind of how python looks at these rever reserved words when you say class it goes class oh i know what that means now if i say zap it's like oh zap something that you get to decide or it's a maybe a variable name so reserved words are simply words that when you use these words in python and there's only a few of them like and or dell or if maybe pass maybe in a lot of these you won't end up using them it's just these are reserved for python and part of the python vocabulary this is python vocabulary now when i when we move from words to sentences you see that python is a series of lines a python program is a series of statements they have an order because the computer wants to know what next what next what next so what next is start at the beginning so i already talked about an assignment statement that basically says x equals 2 this is not a mathematical statement this is a directive to say take this variable 2 this value 2 this constant 2 and stick it in a location in your memory and remember that i asked you to name it x x is a variable something you made up you chose that and so it but it's python's job to remember it so this says go to whatever that x is there's a 2 in there now pull that x back out add 2 to it which makes it 4 and stick it back in x and so that makes this a 4. so x is a 4. and print x says go look up that thing that was an x and print it out and so these are like each line has something to it i'm using a reserved word well actually that's a function but it's k it's a reserved word too and so there's reserved words and all these things and you combine these there are operators plus as an operator equals as an operator these things do things and we'll learn all this stuff in time so the basic building blocks of lines of python now as we take these lines of python and build them up we end up making paragraphs programming in paragraphs and so one of the things that it's important is i showed you how to do interactive python so you just type python and you type a statement in a statement and statement those get really tiring after about three or four lines of python because you start making mistakes and you have to start over so the the better thing to do is to as your program gets a little larger to write a script put your python instructions in a file and then tell python to read from the file and then run the script as it's entered in that file we tend to name these files with dot py and i've got a series of videos that you can watch to figure out how this all works like i said you can type interactively to python and it's a great way to experiment with python check to see if a statement does what you think it does but script is the way after we are passed one or two lines of code we write it in files and then run it separately so there are a couple of basic patterns and it's really important to understand each of these patterns and like i said we'll teach you these patterns separately and then we'll combine them together and when you combine them together when you say oh that's what a program is so you have to suspend disbelief we have a couple of different patterns one is a sequence of steps do this then do this then do this conditional is like skipping something repeated does it over and over and over again computers are really good at repeating stuff much better than people people get tired going over and over doing the same thing and then we have store and repeated steps as well and so if we take a look at this and we take a look at a python program this is a piece of code this is a little script if you type this into the code take this code python code into a file and ran it it starts at the beginning and then it goes to the next line and the next line in the next line and python executes the scripts as you write them so it says stick stick a variable to find a place called in your memory called x stick two into that okay then go to the next one print that out so the program's producing output now go read x and add 2 to it and stick it back in x so x is 4 then print that this side over here this is called a flowchart i'm not going to make drawflow charts i'm only going to draw them a few times that in ways that i think will help you but you can think of it as python when it finishes something it goes on to the next one unless you tell it otherwise finishes this goes on to the next one finishes this goes on to the next one finishes this and now the program is all done and so that's sequential steps you just type them in python runs it they're they're important but sort of uninteresting because you know they're you can't can only get so far and you can't really make them intelligent because it's always going to do the next one so the next thing we do is what are called conditional steps and this is where it starts get intelligent i mean where you are able to encode your brain into the computer like oh wait a sec let's only do this step if something is true and the the syntax that we tend to use here is the reserve word if if okay and so the if is like a little a fork in the road you can go one way or you can go another way and you're asking a question so inside the if statement right here there is a question saying is x less than 10 that's a that revolves is resolves to a true or false if it's less than 10 that's true if it's greater than 10 it's false and so then what we do is if it's less than 10 we have this indented block of code there's also this colon tells us we're in the beginning of an indented block of code and so what it basically says is if this is true run that code if it's false skip that code so it can either run it or skip it depending on this question that's being asked now if you look at this code it's pretty obvious what's going on it comes down x is 5. if x is less than 10 that's true so it runs this code and prints out smaller and then it comes back here at d indents the next basic sequential this ends up being kind of a block if x is greater than 20 if x is greater than 20 oh come back come back if x is greater than 20 this turns out to be false because x is 5 and so it skips this so the bigger never comes out and then it continues on and prints fini oops that's a typographical layer make that a lowercase print and then prince finney so it comes in runs this skips this and then finishes okay so here is the last one we'll talk about the repeated steps we'll get back to store and retrieve uh store and retrieve uh uh later but for now we're just going to talk about three of the four this is another program and the key is is that we're going to use this same choice where we're going to go in but then we're going to run for a while and then we'll have an exit condition where we get out so this is a repeated over and over and over and over again and this is this essence of how we make computers do things that are seemingly difficult while they're more naturally difficult for people okay and so how do we encode this notion that we want to do something for not forever but for a while how do we encode that notion and so we do it in this way so we have our statement sequentially go to this while while is a key word and it's asking another question that's a true false question is n greater than zero i i read this as as long as n remains greater than zero keep doing this indented block and you have a a colon at the end and then you have two lines of code that's indented so that tells us what the loop is and then this is now d indented and so it comes in and if this is true if this is true if this is true it runs these two lines prints out n n is five and then it says n equals n minus 1 which makes n be 4 and it goes back up and it goes up and it asks this question again is n greater than 0 if it is continue on and prints 4 and then subtracts it and it does that 4 3 2 and prints out 1 then it comes up and now after this n is now 0 and is now 0 and n is no longer greater than 0 so it takes sort of the exit ramp and goes down here so it takes the x ramp and goes to here and runs the next line now we're going to cover all this again so i'm just trying to give you the big picture next couple of chapters we're going to hit all these things again and we're going to hit them in much more detail with a lot better information this is now sort of like combining these and again i don't want you to really like know this stuff just you will know this in a couple of weeks you will see this program again but this shows you how we combine those patterns of repeated sequential and conditional together so this is a bit of sequential code comes in here runs this which happens to ask for a file name then it opens the file it creates a data structure called a dictionary this is all sequential now the four is another form of loops so this is going to loop for a while and then this is within a loop we can even have two indents and that's another loop so these are like repeated and then it goes it dials down to the next sequential bit then it does this here's another loop that's going to run and then here's a conditional it's going to run and then when it's all done we print out the last thing and this is of course is the the program that does um you know the it figures out the most common word and prints that most common word out and so this is a python short story it reads uh some data it reads a name of a file it opens that file it talks about how to make a histogram and then it looks through for the most common word so don't worry too much about this over the next couple weeks we'll fill in the pieces so that you absolutely understand every single line of this code so just a quick overview chapter one uh stick with us uh you realize it you it will be chapter seven before this makes too much sense you really gotta have to trust that you are learning important things and that it all makes sense when we bring it together like in chapter seven in a few weeks [Music] hello and welcome to chapter 2. now we're going to continue to talk about the building blocks of python variables constants statements expressions etc the first thing we have to talk about is constants these are just things we call them constants because they don't change their numbers strings etc and we use them to sort of start calculations or you know if uh if if something is greater than 40 hours we're going to do something and so 40 is the constant in that situation so we have 123 we have 98.6 we have hello world which is a string by enclosing it in quotes we pass each of these things to the print function and aside with respect to the print function is that we see the output so print 123 prints out 123 print 98.6 prints it out so these are just really the syntax of constants and without constants we can't write really much of anything the other sort of foundational notion of any programming language are the reserved words and like i said before reserved words are these special words where python is listening for them and there is very special meaning so when python sees if it's not just any other word it means how python implements conditional execution variables are the third building block and that is a a way that you can ask python to allocate a piece of memory and then give it a name and you can put stuff in that sometimes you just put one value later we'll see when we do collections in chapters eight and nine uh we will see that more than one value can be put into a variable and the variable the control how we control the variable is through the assignment statement and as i said before it's important to think of the assignment statement as having an arrow to it so this is not saying x for all time is the same as 12.2 what it's saying is take 12.2 find a place find some memory in your computer there mr python give it a label x we get to choose the x that's the variable part we chose it right um and then stick 12 in it and then the same is true for 14. go find an another spot name it y and then put a 14 in there so think of this as an arrow every time you see that equality the assignment in an assignment statement now these variables hold one value so now if we have these three statements these two and then the third one executes it says put 100 into x but that wipes out the old value of 12.2 and it rewrites it with a hund with a hundred and so we can change the variables that's another reason that we call them variable there are some names now some rules for making variable names you can start with a letter or an underscore we tend not to as normal programmers use underscore we tend to reserve those for variables that uh we use to communicate with python itself so when we're making up a variable we tend not to use underscores as a pre first character you can have letters and numbers and underscores after the first character and they're case sensitive but it's really a bad idea to use case as the only differentiator so in this case uh spam eggs spam 23 and underscore speed are all total legit we would probably not use this one unless we were actually doing it because python told us to use that variable uh 23 spam starts with a number pound sign is starts and dot is not a legitimate variable character and spam capital spam and all cap spam are different but this is not something that you want to sort of depend on too much so that's just the rule names we tend to start them with a letter and then use letters numbers and underscores underscores other than the first character are generally pretty common and you'll see those used a lot now when we're choosing variable names one of the things about variables is we get to choose the name we get to choose the name x choose the name y and so sometimes we like them short but sometimes we want them descriptive and the notion that of making variables descriptive is often confusing to beginning students sometimes it's really helpful to if you're going to have a line of text and you name the variable line that's great because the next person reading your program says oh that must be the line of text whereas it also can become misleading that line the name of a variable somehow has meaning so sometimes we'll have even singular variables and plural variables like friend and friends like is is plural does python know about singular and pearl plural and the answer is no so sometimes we pick variables that make no sense sometimes we pick variables that make a lot of sense this is just something that you as a beginning programmer are going to have to understand that we can pick anything we want and so you'll see i'll try to call attention to this in the first few lectures as we go through so here's a bit of code with an assignment statement two assignment statements a multiplication and a print statement and you can say what is this doing now python is perfectly happy with this code because it assigns it in there you have said please go give me this as a label and then we assign two variables and then we're carefully pulling these two variables back out multiplying them together and sticking them into yet another variable and then printing that variable out that seems like you know we can figure out what it is you just have to look really careful and a single character mistake and python is going to be you know pretty unhappy okay so that's one way to write this program it's hard though because you any of those characters are long variables and they're random stuff it's not very friendly to anyone who might read your program now this looks a little friendlier it's the same program because python just wants a correspondence you picked a you picked b and you pick c and it's really much easier for us to see what's going on and and so this is in a way going from here to here is much friendlier but we can be even friendlier if we pick mnemonic variable names so this is this is not mnemonic this is short and convenient this is long and inconvenient python is happy with any of these here on the other hand is another version of the exact same program and now you think to yourself oh yeah now i get it 35 is the number of hours 12.50 is the rate and then we're going to multiply the hours and the rate and come up with a pay and we're printing out to pay now whoever wrote this program is much is helping us greatly understand what's going on and that's good choosing variable names python again all three of these are the same to python choosing variable names in a way that help your reader understand what's going on is a great thing the problem is the danger is if you read this and you think that somehow python understands payroll that if you name a variable hours that python knows what hours means the answer is python really doesn't care what you name the variable as long as what you name it you use it right and so you've got to be careful and so you'll see i will when i write my code in these first few weeks first few lectures i will sometimes write it with gibberish i'll sometimes write it with extremely short but meaningless variable names and sometimes i'll use meaning full variable names and i'll call your attention to it and and it will get you you'll start when you look at this third kind it has meaningful meaningful variables or mnemonic variable names you'll just instinctively want to give python more intelligence than it sort of deserves i guess that's probably the best way to say that so we've talked about constants we've talked about reserved words we talked about variables and so here we have a sentence like we've already done some of these things where we set x equals 2 we retrieve the old value of x and add 2 to it so that becomes 4 and then we print 4 out print is a function that's built in and we pass in whatever we want to print out so this parentheses is part of a function call okay so an assignment statement you have to really get it your head around the notion that it has this arrow nature and that it evaluates this entire right hand side before we change the left hand side and so you can think of this sort of as at time step one it does this and then at time step two it does the copy and that's how you can have something like x on both sides of a assignment statement and so if for example we have x and x has 0.6 in it x has 0.6 in it what happens is is that it first it sort of ignores this part right here and evaluates the expression so it pulls the 0.6 everywhere x appears it pulls 0.6 out then it starts running these calculations and then it has the new value after all the calculations are done then and only then is it going to put that back into x and so it sort of takes that and puts it back into x and then wipes out the old value at this point this has all been taken care of and it's been reduced down to this 0.93 and so that is what's put in as the new value so up next we'll talk a little bit more about making more complex expressions [Music] so welcome back we're now going to talk about expressions expressions are a little more complex calculations that we can sort of do on the right hand side of an assignment statement so one of the things about expressions is operators and the operators in computer programming are often very much the same as the mathematical operators but we don't have all the fancy characters that we have in mathematics and so we have to choose what's on the keyboard and then we really go back to the 1960s and 1970s and then we used what was on the keyboard in the 1960s and 1970s to make these operators so plus is addition minus the subtraction we don't have a times sign or a or a dot in the middle so we use the asterisk as multiplication division we can't put two things over top of each other so we use slash for division raising to the power because it didn't have little characters back then is star star which is raising to the power and then remainder remainder is the when you do integer division it's also called the modulo operator it's the remainder not the quotient and i've got a picture of that coming up so here's a whole series of little examples of this right so we've already seen you know the plus x equals x plus one keep remembering that these assignments are arrows basically arrow arrow they have a direction multiplication 440 times 12. um dividing this by that's division over over a thousand five point two eight um here we're going to put 23 into jj and then we'll do modulo so that says take 23 divided by five and give me back the remainder and put it in k k so this is the expression that evaluates like this take 23 divide 5 into 23 4 remainder 3. the 3 is what comes back up here okay and so that is the remainder it's also called modulo operator it turns out that for things like picking a random number and then taking the modulo of 52 is a way to pick a card randomly so this modulo operator is actually especially in games and other things super useful so that's the various operators it's important to know which of these operators goes first it's called operator precedence now normally we put parentheses in like you know the so if i put the parenthesis in here i'd say this goes first parentheses then this goes first oh actually not that one oops got that one wrong this happens first this happens then this happens okay and so but it's important for us to be able to know if there were no parentheses the order in which these things will happen so the way things work in terms of opera operator precedence is parentheses are the most important thing followed by raising to the power all else being equal multiplication and division are are all both equal and then addition and then within it's adding left to right so let's see an example of how this works and so if we take 1 plus 2 to raise to the 3 power divided by 4 times five and we print out what comes out of this so the way i did this when i was taking exams back many many years ago when i was first in computer science is i'd write it all down and i'd look for the highest precedence thing now parentheses would make this easy but exponentiation is the first one so that means we're going to take this and that's going to be eight two to the third power two times two times two two two cubed is eight then what i would do is i rewrite the whole thing with the 8 there and i look across and i'm looking for multiplications because the power's been done the multiplication is what i'm looking for next and then there is both multiplication division they're equal they're at the same level and so what happens is they're done left to right eight divided by four happens before four times five and so the fact that it's not four times five but instead eight times four is because of the left to right rule so then this gets rewritten to be two one plus two times five and this one multiplication is the top one so that does this next 2 times 5 becomes 10 i rewrite it again and then 1 plus 10 addition is the lowest thing and that's how we end up with 11. and so that that's how i would do these problems if i ever saw the problem on an exam and it's a fun problem to put on exams because there is one and only one answer and every programming class has usually at least one slide about this stuff so like i said the rules go top to bottom parentheses power multiplication addition and then left to right within it so we talked about variables and computing values to put inside variables but the one thing you've kind of also maybe you noticed it as we go by is we have different kinds of data we call it type is this of type integer is this of type floating point number is it of type string what is going on here and python is pretty smart about various kinds of types of data and so you know we're adding one plus four here and python knows as it looks at this that that's an integer and that's an integer and we'll add it together and make it an integer so that thing is an integer we can also use this plus to concatenate two strings this is hello blank plus there and plus looks here says oh that's a string and that's a string so i know what to do with strings i will concatenate those two things together so it becomes another string that gets assigned into ee e and it's hello space there the plus doesn't add the space i added the space by putting it right there and so these operators are kind of smart in that they kind of know what they're dealing with and sometimes they will do one thing or another depending on the kinds of values variables or constants that they're working with and so sometimes type can get us in trouble so here we have eee which is hello there because we've concatenated these two strings together and now we're adding one and the problem now is that it looks on one side and says that's a string and that's a number and says i don't know how to do that this is another one of those annoying errors that you would like you think that somehow python doesn't like you but it just is confused if you look at these things trace back traceback always means i quit it means i stopped i ran i'm quitting now because i don't want to go any farther because i've become confused so it so your program stops running and you say here's where i stopped running because we're typing interactively it's always line one here type it but you for read carefully and you don't get too stuck on too much stuff line one that tells us something in module type error can't convert int object to stir implicitly so that's an integer right there and that's a string and that's what it's complaining about that little bit right there if python is so grumpy about types then we should be able to ask it about types so it turns out that there is inside python a built-in function called type t-y-p-e so we can pass into type so this the syntax is calling a built-in function name type parentheses is the parameter that we're passing to it we're saying hey hello tell me something about the type of the variable e e e and so this is a function the parentheses are part of the function call and it says oh that would be of class string and then we can pass in a constant says hey what about hello the string hello it's like oh that's a string two what about a one well that's an integer and so we are asking python through the type function what the type of either a variable or a constant is and there are even several types of numbers and we'll even see booleans and others like later like one with no decimal that's an integer number 98.6 with a decimal that's a floating point number and so you know constants and constants can be both integer and floating point and i'm just asking over and over and over again what is the type of what's in xxx what's the type of what's in temp and what's the what's the type of the constant one and what's the type of 1.0 you can also use a set of built-in functions like float and int to convert from one to another and so this basically says i want to convert oops let's go back i want to convert 99 to a floating point number so this is a function and it's participating in this plus but before i can finish the plus it turns this into a 99.0 the difference in 99 as an integer 99.0 is that it's a floating point number and that actually turns this computation as it looks to the left and looks to the right it says oh i've got a floating point number on one side of injury on the another other side and so i'm going to make my calculation overall be a floating point calculation i can also pass into the float function i can say take this variable i which has a 42 also an integer and then give me back a floating point so that'll be 42.0 pass that into f we print it out and it is indeed 42.0 and it's a float and so in it knows the type and value in any variable this is an integer a value 42. this is a float of value 42.0 um integer division in python 2 was kind of weird and it was actually one of the big things that they changed between python 2 and python 3 just a python 3 course so we're not worried about that too much what's nice about integer division in python 3 is it always produces a floating point result and that means that python 3's division is more predictable and it works more like a calculator so in this case i mean you can go back and look at my python 2 lectures and see how crazy it was in python 2. 10 divided by 2 is 5.0 and the weird thing here is these are both integers but the division forces the result of the calculation to be a floating point number and this you know 10 over 2 could be 5 but 9 over 2 is 4.5 and so that is accurate in old python 2 that would give us back 4 which is completely unpredictable and weird the same with 99 over 100 as you would expect if this were a calculator you get 0.99 actually what you get in python 2 is zero because it would round it down it doesn't i mean it doesn't round at all it truncates it so 99 over 100 is 0.99 and then it truncates it to zero that's python 2. we're not talking about python 2. there's a good reason we're not talking about python 2. welcome to python 3. of course if there are floating point on either side the result is still a folding point floating point and the result is still a floating point so integer division produces a floating result in python 3.0 not in python 2.0 that is an improvement in python 3.0 and that's why we're recording these lectures i have a whole great set of lectures about python 2 and now i'm going to have a great set of lectures about python 3. welcome to python 3. okay so we've been talking about converting from integer to floating point but you can also convert from string to integer or string to floating point and so here we start out with a little string value now it only works for strings that are made of digits so quote one two three quote is not an integer it is a three character string that has one two three as the characters in that string which is very different than 123. we say what is the type of this it's a string we say let's add one to it and it says can't convert into string so that blows up right because this is a string it looks to both sides string plus an integer not good okay but we can convert this we can call the int function which is like the float function and pass a string in so it says hey take this and turn it into an integer so take the input of s val which is the string 1 2 3 and give me back an integer representation of that which is going to be 123. so we say what kind of thing do we get back we got back an integer we can now add one to it and get 124. and so you have to manage the type of things and you can convert from one type to another now int is not magic if you send something into it a string that has doesn't consist of digits then you're going to end up with another error invalid literal for integer with base 10 blah blah blah blah so it's really complaining it says i want these to be numbers here and you just gave me letters so that's going to cause this to fail another thing that we're going to do with variables is just like the print function takes something a list of things in this case a string comma a variable and then print some output in the program the opposite that is input actually input generally happens before output input is a built-in function and we pass to it a prompt a string of text that's going to be printed out for the user and then it stops and waits so it says who are you and then right here it just sits waiting for us to type something so we type blah blah blah blah and then hit the enter key right we hit the enter key and then this text ends up in this variable so this is an assignment statement that chuck is the result of the input call gets copied into the nam variable so let's do that again it's evaluating an assignment statement remember it's kind of this way or you can think of it as do this just this right side first it it writes this out writes that out then it waits wait wait wait wait wait until we hit the enter and takes this chuck and that becomes the result of this input which is then assigned in to nam now then we go sequentially to the next line it prints out welcome comma and a contents of the variable nam now this one this comma here actually does put the space in here automatically so it says welcome space chuck so it pulls the there's no space in chuck just just the chu ck and so print can take more than one thing separated by commas matter of fact print can have uh you know a whole bunch oops come back come back back print can have comma comma comma parenthesis as many as you like everything you've seen up to now is kind of one thing in the print but that doesn't mean the print only can do one thing so i've talked about variables we talked about constants we've talked about input we've talked about output and now it is time to write our first meaningful program and so this program has to do with those of you who have uh traveled internationally if you traveled to united states and you traveled outside the united states you notice that there is an elevator convention that is different inside the united states the united states the walk in the ground floor in the elevator that's one and if you walk into ground floor in europe or many other places in the world and the elevator is zero so we have written a small app that we're going to put on the app store and get wealthy with with called elevator floor conversion app and it it's going to ask us we're in europe and we're lost and this and you say well what floor would this be if i was in the united states of america and so here's we have to read the floor that we are at at in europe and then we're going to convert it to a u.s floor and then we're going to print it out this is very silly but it is a pure essential program that has input does some kind of task on that input and then produces some output which is useful for some value of useful okay so let's take a look at how we combine everything that we learned in this lecture input processing and output it's a three-line program but it's sort of the beginning of something that programs do okay you're gonna do lots of programs that do this so here we go program starts we do the input side effect it prints out this and then weights we type in zero that comes back here and the zero which is a string input gives you back a string it doesn't give you back a number it's a little different in python two but in python three input gives you a string so quote zero quote which is what we typed here we didn't protect the quotes it's a string it gets stored in the imp variable then we move to the next statement and on this right hand side we convert that string variable to an integer so that becomes the integer zero we add one to it and then that becomes one and then we assign that into usf i've named this variable united states floor right so imp is the input and usf that's mnemonic it doesn't know anything about elevators it's just i picked a variable that was quite friendly and so at this point usf has the united states floor that's equivalent to the european floor and then i just fall down and i do a print statement print out u.s floor floor comma that's this space right here and then whatever the contents of the u.s floor variable is and you could see that i could write this on four and it would say three i could write this and say seven and it would say six this is an amazing program it converts floors in a european numbering scheme wait actually no i got that wrong hang on let me clear this i wasn't thinking clearly i could type in 4 and it would give me back five i could type in six and it would give me back seven see i'm confused haven't been in europe in a couple of couple of months and so i forgot all about the floors but that's the idea now this is a super super super simple program not super useful but you get the idea that we're going to pull some data in we're going to do some intelligent thing we soon this will be hundreds of lines of code instead of one line of code and then we're going to present the results to our user now another element of most any programming language is what's called a comment a comment is a way for you to put in a program file some text that's to be ignored by python or c or whatever language we happen to be using in in python comments start with a pound sign so what you can do is put a pound sign anywhere in a line and then after the pound sign python ignores everything after that pound sign it can be the first character so here's our recurring i uh concept that we talk a lot about we're not going to cover this remember what this does this is counting how many letters the the there's 16 those and there's in that file there were six twos or whatever it was this is that code we'll we'll get back to this code but what we've done here is i've added some comments that that that are really for human consumption so this first paragraph is get the name of the file and open it the second paragraph is count the word frequency you know maybe i should have said histogram here count the word frequency and assemble a histogram and then here i'm putting this pound sign in find the most common word and then i'm all done i print the stuff out right and so all i'm saying is comments are for people to read your next programmer or the person who's going to change your program after you're done with it and they're nice and you don't have to use any particularly weird syntax or variable naming conventions you put a pound sign in and you can write anything you want from that point forward okay so we've talked a little bit about variables and types and mnemonics and how we would choose variable names and how expressions work and the various operators converting between different types printing input output and comments so that just kind of gets us sentences and coming up next we'll talk about conditional execution where we're really starting to move up to paragraphs so see in a bit [Music] so hello welcome to python for everybody i'm charles severance your instructor uh and i'm on in this assignment i mean in this video i want to work through exercise 2.2 and it's just a basic uh hello kind of hello world um and so i'm going to show you how i want you to do these things and i've got this folder called python for everybody and i'm going to use the atom text editor so i'm the whole thing is it's supposed to say hello enter your name there's the last one we did okay so i'm going to make in python for everybody i'm going to make a new folder new folder and i'm going to say um ex0202 make myself an exercise and i'm going to create a new file and i'm going to say i'm going to say print hello and i'm going to say file save as and it's important to get these in the right folder and so i'm going to call this ex02.py always end these in python because then you get the syntax highlighting okay so now again you're not going to like this but i want you to learn to execute these things using the tool the real tools that we're going to really use because when you're applying this i want you to know how to actually do real work so i need to get to the point where this terminal program and i could say start run terminal that's another way to get the terminal program started but i've got her down here in my docs so that saves me some time and of course if you're on windows this is the command prompt and so in windows you type cd but in mac i type pwd and that shows where i am at in the directory structure and i can go into my desktop this would work the same and i can do an ls in windows i would say der and i see that i see this desktop folder and i'm going to go into py4e and if i keep doing pwd you see i'm navigating deeper and deeper into this uh directory structure and then i do an ls here and i'm going to cd one more time and if to a folder oh i did that really quick and didn't tell you what i was doing cdex tab so that's because if the name name of the file or folder is unique you can hit the tab in the command line and away we go and so now we're in this folder in the users folder pi for e desktop py for ex0202 and then if i say ls and that's what you want to get to you want to get to where you know we're at in the folder structure and you know the file that you're going to run and i've saved this and so if i say python3 ex02.py it runs and now i know that what i'm doing in this screen can be run in this screen and oh i did that again i hit the up arrow and i can run it again so you'll find that the way you write these programs is you write them here and then write them here and run them here and again there are shortcuts and and and those are crutches and you might like them but i just want you to learn how to use your computer like a real person uses the computer because that's when we start dealing with databases and files you're going to want to know where you're at on your computer and so if there's some fancy clicky button thing that says automatically run python that's convenient but i want you to know all this stuff okay enough about how well probably i'll just keep talking about that all the time so now that we've got our problem write a program that uses imp input to prompt for their username and welcomes them so this is pretty simple go back to adam come on adam okay so we have to have a variable nzt work well later we'll talk about choosing variable names uh right now i'm going to choose a crazy variable name input is a function and then you put a prompt and i'm that's supposed to be enter your name colon i don't put a space in there so i'm going to save that and just run it for yucks and there you go that's one thing you'll get used to just type in gibberish really quick and then we're going to say print what do we got to say here print hello and go pull that variable back out save that command s is how i'm saying that save command s i saved it so i didn't make file save but i could have said file save but command s saves it and now i can run it again zap hello zap so now i can run my thing oh and command k clears it i don't remember what it is on windows but there's a there's a way to clear your screen i like to keep my screens clean hello chuck and so we have completed exercise 2.2 with a little program oh and by the way i'm using alt tab that's what i'm when you see this i'm using the alt command tab and the same thing works on windows as alt tab okay okay so i hope that was helpful to you just to kind of walk through one of the assignments assignment 2.2 in python for everybody hello and welcome to python for everybody my name is charles severance in this short video i will be explaining how to run exercise 2.3 where we prompt for some hours prompt for some rate and multiply them together and print them out with a little pay message and so this is uh 2.3 some of you will immediately want to go to the autograder and sort of do your homework on the autograder i really would rather you didn't do that um unless of course you're doing this on an ipad or a or an android or something where you can't install python but you have to realize that the auto grader isn't forever you can only go so far with the auto grader and eventually you have to write a real python program so i'll eventually show you how to run this auto grader but uh and i'm going to instead show you how to run it in the terminal so i'm first going to go into my python for everybody folder and i'm going to make a new folder command shift n is what i just did there ex0203 for exercise three and so there's exercise three um and i'm also going to go into atom which is my text editor and uh see i just i have that folder and so i'm gonna make a new file and i always say print i'll just say p y for e oops p y for e and then i'll say file save as and i want to make sure it's in here and it's going to be e x underscore o two i don't like putting spaces and file names some operating systems can handle them but i that's why i'm using underscores here so i would avoid using spaces and file names so as soon as i give that a py4e as soon as i give it a python suffix i'm there and so it shows up there in my desktop and now i'm going to run the terminal program so that i can get there so cd desktop cd python for everybody that's that's the folder on my desktop and if i do an ls i see a couple folders and a file you can say ls minus l and see a little more detail that these these are folder these two are folders and this one's a file so change directory cdex 0203 and so now i'm in that folder and if i do an ls minus l i see that file i can also do an ls without the minus l and see the file and now i say python3 ex0203.py and it runs and i you'll see me no matter how many times you watch me you'll see the first thing that i do is get to the point where i know i'm in the right directory and i can run a little hello program before i start coding i just don't like being crazy right so now i'm going to go back and take a look at my assignment enter hours you got to prompt for hours ask for a number enter rate prompt for rate and then calculate pay so there's a couple of input statements here um xh is my variable i'm going to choose later i'll choose more effective variables but for now i'm going to make them silly um enter hours colon space and then i'm going to copy and paste and call this xr for rate when you're doing this you need to be very careful to uh and so now i'm gonna calculate xp time which is x h times x r and then i'll say print pay oops i don't need to put a space because this comma effectively creates a space xp and then i'm going to save that and i'm going to switch to my terminal program clear my screen in my terminal program and i'm going to type up arrow because i already typed python3x203.py so my hours i'll just start with something really simple that i can calculate in my head 10 and five whoops can't multiply sequence of non in of by sequence of non-in of type stir here we have a traceback and again i encourage you to realize that these tray specs are not uh personal attacks by python on you even though they might be frustrating and so the way to parse this is start by saying line three something's wrong at line three it's pretty good at knowing what line it is or it's either that line or the line above it and it's something about multiplying you know it's just it what it's really saying is i'm confused i have to stop because i cannot understand your instructions so the problem here of course is that this is of type string and so you can't multiply a string times a string okay and so we can convert this using the float float so that's a function call now we're passing the string hxh in and the value we get back is the floating point version of that and then we call float for this as well and so now i'll save that always remember to save so i'm going to run it and so i'm going to run my hours of 10 and my rate of 10. and it's a hundred and so that looks pretty good okay so let's go ahead and try to run this in the autograder and this is the my idea is you'll take this and you'll copy it and you'll go back to the auto grader now and just paste this in okay and so it says use 35 hours and a rate of 275. so let's check the code 35 hours okay 75. oh no 275 2.75 and so it's running and it's running and it's running and it works and of course now i've got my grade so this idea where you work here to get your assignment done correctly and then you run it in the autograder is the way i intend for you to do it but again if you can't do it that way it's a great way to get started to just write your code in the autograder and you know you can change your code in the auto grader and then run it again of course this is going to fail 35 and 2.75 and of course you get a mismatch and now it's angry at you and the the mismatch here of course is because i print howdy pay and pay and it's real picky about it and you think oh i got the 96.25 right well it it doesn't really care so much about that so uh let me go ahead and fix this and run it so we leave on a successful note uh 35 hours and 2.75 as the rate per hour it's kind of a low rate per hour and we're getting successful and of course that means that you now have a grade on assignment 2.3 look at that i got a grade on assignment 2.3 unless of course you're running this in some other environment okay thank you uh so much and i hope that this has been useful to you hello and welcome to chapter three conditional execution in conditional execution we meet the if statement the if statement is where python can go one way or another way and it's the beginning of sort of our way of um making python make decisions for us sequential code we just you know do some things sometimes that's useful but now we can have our code check something and then make a decision based on that thing so the conditional steps in python are pretty straightforward the key word that we're going to use is the if statement and so if is a reserved word and um the if statement has as part of it a question that it asks and this is asking if x is less than 10 and the colon is the end of the if statement and then we begin an indented block of text and the way this works in this particular thing is this this line is the conditional line if the question is true the line executes and if the question is false the line is skipped and you can think of it the way this is right x is 5 ask a question is it 10 or not these questions do not harm the value of x if it is then we run this code and then we sort of rejoin here and we then we test this next if and if that's true we do this code and then we do there but in this case it's going to be false because x is not less than 20 and so it just continues down here so if we look at how this works it runs it runs this line then it sees this question and skips that line so this line does not run and so smaller prints out and fini prints out okay so that's the basic idea of an if statement and the indentation we when we are done with an if statement we de-indent back and there's this little block this is one sort of if statement and this is another if statement and these are the two condition lines that either run or they don't run depending on the question the answer to that question so we have a number of different comparison operators that we can use to ask these true false questions that say is this true so again we're kind of limited to the key keys that were on computer keyboards in the 1930s 40s and 1950s less than less than or equal to so we don't we didn't have fancy math characters so we just concatenated less than equal to b less than or equal to this double equals is the the asking is this equal to and so that's a little tricky the equal sign is that assignment operator if i was building a language today from scratch i would probably make assignment be arrow and the equals question to have an equals or i might say somewhere i would say question equals but i'm not writing this building this language so that's it's not up to me so this is the question double equals is asking the question is equal to greater than or equal greater than and not equal so this is the exclamation point it's sort of like not equal so that that's sort of not equal so that's how we do not equal so if we take a look at some of these in some examples all of these are going to be true because of the way um x is set if x is equal to 5 that's the question version that's true or false it'll execute that if x is greater than four it's going to execute that if x is greater than equal to five it's going to execute that here's kind of a shorthand where there's if there's only one line in this block you can kind of pull it up right on the same line after the equals if x is less than 6 which it is true execute that then if x is less than or equal to 5 do that and if x is not equal to 6 do that now like i said all these questions have been carefully constructed so that they're true just to kind of show you the syntax of those comparison operators now you don't just have to have a single line of text in the indented block and this will be something you're going to get used to so if we indent more than one line then the indented the um conditional the conditional code is actually these three lines so the idea is you have an if statement you come in you do an indent and as long as you stay indented you stay in that if block if it's false it just skips all of those so the way this is going to execute x is five print before five is x equal five that's the question mark and that's true so it's going to run all these and then come back and then continue on in the d indent so all this stuff is running right and then it says f x equals six well that was false so that skips all of them so none of these lines of code run so these actually don't run and it says afterwards six so that's a mistake those don't run right there okay because x is not equal six okay so indentation is an essential part of python uh we use indentation lots of programming languages often to kind of de-mark demarcate blocks to to show where blocks start and stop but in python it's syntactically correct it is you can make an error if your indentation is wrong after an f you must indent and you maintain the indent as long as you want to to be in that same if block and then when you're done with the f block you reduce the indent in this rule of indenting uh comment lines and blank lines are are completely ignored so we're going to tend to like put four spaces four spaces ends up being four spaces ends up being the the normal thing that we do and you'll see all the code that i write uh has four spaces for each indent if i go in twice i use eight spaces um and we have this instinct of wanting to hit the tab key to move in four spaces now the problem is is that it might look the same on your screen a tab and four spaces might line up the same place depending on how tabs are set uh but python can get confused by that so we we tend to uh avoid using actual tabs and files and so most programming text editors like if you're using notepad or textwrangler there's a place to set the tabs to say don't put tabs in this document but every time you hit tab move over four spaces so if you hit a tab but it's like space-based space-based space now the nice thing about atom and this is uh the the text editor we tend to recommend in this class a because it works on windows linux and mac but also because it automatically sets this up as soon as you save your file with a dot py extension you can sort of hit the tab key with impunity and everything works perfectly but the key thing here is that python insists that you get this right and if you don't get this right you're going to get indentation errors and they're just another they're just another syntax error so if you're using something like textwrangler or notepad run around in the preferences and you'll find something about expanding tabs or maybe how many spaces each tab spot stop is supposed to be and so you check these and what this really is doing is telling your text editor never put an actual tab in the document but somehow simulate tap stops using spaces and so here is a bit of code it's got some nested it's got a nested block but it gives you the sense that you have to be very explicit when you're reading python code of whether the indent is the same between two lines the same increased or decreased and and so you've got and when every time you increase it you mean something every time you decrease it you mean something and literally if it stays the same you mean something as well and so if we take a look at this here we have a line and it has the next line has the same indent this is an if with a colon at the end so we have to increase the indent and now we're maintaining it okay so these two lines are part of that if but now we have d indented so whether you choose to de-indent this word or this word or whatever the where you do this d indent affects the scope of how far this if statement lasts right it lasts up to but not including the line that's d indented to the same level as the yet okay so this is a d indent now we have a blank line which doesn't matter and we maintain it and we have a four which we'll learn about in the next chapter which is a looping structure let's do a four four runs this five times it has a colon and it also expects an indented block now we have what's called a nested block where we have an if and a colon we go in some more so this is like two indents right so these are one indent and these are two indents and so this is a block within a block and then we de-indent so that means this print is not part of the if statement but it's still part of the for statement and then we de-indent again and then that means this print is so that on the same level as that for statement so if you start thinking about this you want to be able to start thinking that these blocks are the start of the block with the colon line up to where the up to but not including this line that's been de-indented so the four goes this far right the four goes up two but not including the line that's the indented the if goes up two but not including the line that's de-indented so as you do this you'll sort of mentally start drawing these blocks and pretty soon he will start constructing them as blocks and it it takes a while but doesn't take forever but in python unlike other languages in python unlike other languages you have um this this is very important and it matters and you can have syntax errors if you get it wrong because you're really communicating the shape and structure of your code using these indents and de-indents we already saw a nested indent this is a nested if so you can put an if within an if and you can go as far deep as you want to go like russian dolls and so here we have x equals 42. if it's one we indent one and then with this next thing we do these are at the same level of indent but now we see an if and it has to indent further so this is like two in eight spaces and then then we de-indent back actually the indent back too and so if you watch this and you take a look at how this works it runs to here oops back up comes in here the answer is yes x is greater than one prints this is x less than 100 well it's 42 so the answer is yes so it runs this and then it kind of continues back to there and you can also think of drawing boxes around this this is one if box and then within that if box there is another if box and again it's the indent the indent block up to but not including where the d indent happens and this here is like two backwards d indents so it ends two blocks so two blocks are entered by where we place this we could move this in or we could move this out we could have it all the way into here we could have it to here or here and where we put that line depends on how the ends of these blocks are going to work out so one form of a that's a one branch if that we just show we just saw but then you can also have what's called a two branch hit and the basic idea of a two branch if is that you're going to come in you're gonna ask a question and you're gonna go one direction if it's yes and another direction if it's no we call this an if then else it's kind of like a fork in the road and the way to think about it is depending on the output of this question we're going to pick one or two of these but if we pick one the other one's never going to happen so it's like an either or we're either going to go one way or we're going to go the other way but there is no path where we somehow go boot through both of them that that doesn't happen and the sinks to the syntax that we use for this is the what we call the if then else and so the first part is normal if with an indent and then we de-indent and then this is another reserved word else with a colon and then we re-indent and so this is really end up being part of a whole block here and the else is the part this this is the part that runs if it's false and this is the part that runs if it's true the first branch of the if the first indented block is what runs if it's true and the second indented block is the one that runs if it's false and so here we go just if x is greater than 2 in this case it's yes we're going to print bigger and we're going to be all done and so we do one and so this one did run and this one did not run so basically with an if then else one of the two branches is going to run but there's no case in which both branches run and again you sort of draw these blocks around these things mentally and in this one you sort of take from the if not the else is really part of the block up to but not including that print which is back indented de-indented back to the same level as the if state okay usually this python is actually one of the more elegant languages even though after a while this indenting and when you get too far in it gets a little bit complex but but this is a good way to visualize this with these indents coming up next we're going to talk about some more complex conditional structures [Music] so welcome back let's talk a little bit more about some more complex conditional statements that sort of build on this concept of if and if then else the first thing we're going to look at is the so the multi-way branch and so the idea is it's kind of like the if then elsewhere you're gonna pick one of two but now we can pick one of three or one of four or one of five um and it introduces a new concept called the lf the ls is another reserved word inside python and the way it works is it it's probably best to look at this here where it checks the first one and if it's a true then it runs that and then it's done it doesn't check them all it's not like it sees that there are two logical conditions it actually checks them the first one and how you order these matters as we'll see in a bit and so if the first one is true it runs if the if the first one is false and the second one is true it runs this one and it's done and if the um neither of them are true it falls through and there's an else clause that is uh otherwise and it runs that so so basically it's either gonna it's gonna run one and then skip the other two or it is going to you know skip skip one skip two and then run this one but it only runs in this case one of them but the important thing is it checks these questions in order and it doesn't check the second question until it finds that the first it doesn't check the second question until it knows the first question is false so if the first question is true you're done you're done and you're done with this you're done with the whole block at that point so only one of these three is going to execute in that block so here's sort of some examples of this if we for example have x equals 0 it's going to come down here x is less than 2 that's true so it runs this code and then it skip skip skips down to the that and so it's like this runs that code and then skips to the end okay on the other hand if it's 5 then this is false and it skips that and it checks this this is true it runs this code and then it's done skips to the end it goes like false true run end and then if x is like 20 for example it runs it runs false false run else clause and you're done so skip skip op else run that code and you're done so in this case we ran that and we didn't run that and we didn't run that again one of them's going to run they're checked in order these questions are checked in order not out of order it doesn't look ahead it just checks in the order that you wrote it you're the one that wrote that order and so there's a couple of variations on this multi-way you can have no else you can have no else as in this case and this just means that it might not run any of them in this case x is 5 so it's not less than 2 but then it runs this one but if x was like you know 50 for example if x was 50 then this would be false don't skip and this would still be false and it would skip and neither of these two would run so if you don't have an else you're not guaranteed that one of them is going to run because else is like the catch-all if the other ones are all false then the else is the one that runs similarly you can have many lfs but this is where it's really important for you to make sure you know what order they're being taken in so that i've got you know this if this is true it runs it goes all the way to the bottom if you know this was if if it's false false false true it runs this one and it's done if on the other hand it looks at it as false oops go back go back if it runs false false false false they're all false then it runs the else right this one has an else this one didn't have an else they don't have to have them the key is you can have more than one of these ls okay so i got a couple little things i'll let you pause right now and look at the question is are there looking at the three lines or four lines of code x equals something are there lines of code that will never value execute regardless of the value for x and i'll let you pause and think about it and then i'll explain it to you okay hopefully you paused and thought about it as long as you like but so let me now explain it to you so we come in here and if x is less than or equal to 2 it's going to run this first thing and if x is greater than or equal to 2 it's going to run this and if neither of those are true then it's going to run this well the weird thing is for there's no all numbers are either less than 2 or greater than or equal to 2. i carefully constructed this to the point where it would never run this line of code it is either going to run this one or run that one but it's not going to ever run this one so that was kind of like a weird dysfunctional one that i constructed this other one is a little different if x is less than 2 we do this if x is less than 20 we do that if x is less than 10 we do that and if none of those are true we do that well the problem here is between these two lines the problem is if something's less than 10 like six for example it's also less than 20. so even though this there might be values for which this is true those also are going to have this true so for something like six it's going to run here and it's not even going to look at this that's the point it doesn't even look at this and so that's i mean i could have made this more sensible if i had to move this little block of code up to there so this is where the order in which you choose your questions the the way you put these elfs together matters because it doesn't look at all of them it only looks as long as it can as long as it sees falses then it keeps on going to the next one but as soon as it doesn't see a false it doesn't continue so the last conditional structure we'll talk about is the try and accept structure if you look if you know any other languages like c plus plus or java or javascript you're like well that's kind of an advanced concept but it turns out in python because of python's propensity to um throw tracebacks in situations where you kind of would like to recover it turns out you kind of have to use it a little more and a little earlier in your programming skill so the problem is is what if you there is a line of code and you absolutely know it's going to make a trace back it's going to blow up but you don't want to blow up i mean i don't want to blow have code build up if you're using my autograder and you see a traceback in my autograder that's kind of like i consider that a failure i could put an error like hey you entered blank data or you didn't enter a number but a traceback that just seems like i'm too lazy as a programmer so we as programmers are supposed to anticipate parts of our code that are going to blow up potentially based on perhaps the user's input and then do something about it and that's what the try and accept are for if you take this little dangerous piece of code that might break and might blow up and you surround it with a try and says this might blow up and if it fails run this code down here okay so that's the try and if you get an exception the accept is kind of like if you get an exception and the problem is is if you are running code here's a little bit of code we we put hello bob in and we convert it to an integer and we know from past experience that this blows up right you can't take hello bob and convert it to an integer is just going to blow up the problem is is and you know here we are it says oh you blew up on line two that's great and i'm not very happy with hello bob and and whatever but the important thing is your program stops these other lines they don't oops these other lines they don't exist right it doesn't go any further it it remember the traceback is i'm python is really confused and i don't know what to do next so python is just going to be conservative and stop so python stops and your program stops no matter how much error checking you put down here it doesn't matter because it's gone it's all gone and like i said we we take this kind of personally because the code that you write is like the you know you being put into the computer giving it instructions and if the code blows up well that sort of wipes you out you're not in the game anymore you're not able to do anything so we want to be able to especially in these situations where we can anticipate that a an error that might happen in the normal course or your program's execution might be something that you want to compensate for and that's what the try and accept does so here's a bit of code for the try and accept and we just have two little bits of straight line code and so we put a string in here that's hello bob and then we're going to convert it to an integer this is the dangerous code this code in this case with hello bob is going to do a trace bet and so we say try and then we indent the dangerous code and then we put add this little accept bit if it works the accept is ignored if this blows up it runs the accept so in this code it's going to come in it's going to try to it's going to try this this is going to blow up but instead of giving a trace back it's going to say oh i've got an available accept i'm going to run this accept code and then i'm going to continue on and so that prints out first negative 1 so because we set this variable ister to negative 1 like a little flag telling us that something went wrong and then we keep on going and now we have put in one two three the digits 123 the digits one two three and now it's going to work but we still have it in a try block and then this one works it does not blow up and then ignores the except block so the accept block is only triggered when something goes wrong in the code it is ignored if something doesn't go wrong so it's like you bought an insurance policy on this line of code and when things go wrong your accept block springs into action and does whatever it is that you want it to do in the case of an error okay so that's a pretty useful thing you got to be a little bit careful that you don't overuse it because if you put more than one line inside the tri part and it one of the lines blows up it doesn't come back to the try block and so in this one in this one here we have kind of a simple silly one where we set the string we're worried about some stuff well the print statement's never going to blow up so it's a bad idea to put it in try accept anyways then we do this conversion and that's the dangerous part and in this one it's going to blow up and and so then it's going to go to the accept block and then run the accept block and then continue what it does not do what it doesn't do is somehow go back and finish this so these lines are gone so if you look at it like this this works the try starts hello this blows up it goes to the accept it runs the accept and it continues on never runs that code so it's not like you took out an insurance on the whole block any of those lines can blow up in the block but whichever line blows up that is the last line that's executing in that block okay so you tend to want in this particular example you would probably the print statement would go out there and this print state would come down here and you would only put in your try block the single line of code that you think might blow up because you kind of know print statements aren't going to blow up so this is an example uh that's a more common real world example where the user is going to type some data and that's users that get us in trouble so our program starts by asking the user enter a number and we know that this could be dangerous so we're going to do it we're going to put the the conversion from string to integer in a try block and we're going to set negative one if that's a failure and then if it's neg if it's greater than zero we'll say nice work and if it's less than zero well not a number so first time we run this program out comes enter a number we type in 42 which is a string that 42 goes back into roster runs in here this runs it's fine that becomes a 42 number so we skip the except block and i val is greater than zero we print out nice work and we skip the else okay so it says nice work on the other hand if we run it again this time the input says enter a number and we be we're silly we enter the word 42 but in in words 40 f-o-u-r-t-y so that's a string and that goes into raw stir and then the execution continues we run in here and now this is going to blow up that's going to blow up normally we would see a traceback right there there'd be a traceback but we're not going to because we put this calculation in a try and accept block it's going to immediately run the accept block set i val to negative one continue on with the program see you are not blown up at this point and if five l is greater than zero well it's negative one so we're going to hit the else clause and print out not a number so we've done error detection the user set something that caused a line of our code to kind of blow up but we put that line in a try and accept block and so we caught it and so we we dealt with that fact so in summary in this we talked about if statements we talked about else we talked about trying to accept how important indentation is to to mark blocks where they begin in the end um and then else if and try except so up next we're going to talk about loops and iteration hello everyone welcome to python for everybody i'm charles severance i'm the instructor for the class and right now i want to go do exercise 3.1 rewrite the pay computation that we did earlier in the previous chapter and give the employee one and a half times the hourly rate this is rather simple it's a very classic computer problem because it gives us our if then else and there's a lot of different ways to write it so we're just going to do enter the hours and the rate and do the pay and so um i'm going to start by going to my terminal cd into my desktop cd python for everybody and so here we go let me go into atom and get things started i'm i've already got this folder i mean this file from the last one assignment 2.3 and i'm just going to save as so i'm going to duplicate this file and i am going to go up to py4e and then i'm going to make a new folder i'm gonna call that folder ex0301 for exercise three one and then i'm gonna call this ex0301.py now that's just the same files i had before and if i do an ls you'll see i've got the new folder so i cd into the ex0301 folder so that's the folder that i'm in all the way down into this folder and i do an ls and i see this file and so life is good and i can say python3 ex0301py i'll just put in 1010 so it's a hundred now the thing is is that we're supposed to give time and a half for overtime and so that means if there's more than 40 hours 10 and 50 it's not supposed to be 500 it's supposed to be 500 plus half of 10 extra hours because the 10 extra hours are the 10 above 50 so that should be i don't know we'll figure it out it's easier to run a computer okay so this is the code we've got so um what i'm going to do is change this i'm going to make a new variable called a r i'll call it fr stands for floating point rate and floating point rate is going to be the float of the string rate let me just call this s h for string hours and s r for string rate and s r so this stance my variable is f r for floating point rate just so i can keep them straight in my head s h is string hours and so i'm going to say floating point hours is equal to float of string hours and now i can change this so i just entering fh times fr so i i split this out i can even like print fh comma fr it's perfectly fine when you're writing this to add like oops extra print statements just for your own sanity so now we're going to read the two values we're going to convert them to floating point numbers we're going to print those floating point numbers out we're going to multiply them and then we're going to have the paste and then i save it always remember to save it right so it has this little dot up here all editors give you something that tells you you didn't save it and i'm so i always keep saving it because if you come down here and you run it and you it's like it didn't change well that's because you forgot so let's do our 10 and 50 hours so we see this extra print statement that came out that's that extra print statement right there let me make this a little smaller and let's move this over here so we can see it a little better yeah make it a little bigger so we can see a little better so uh that print statement's there and um you know just for yucks i'm going to comment that out and so commenting things out is a good way to keep stuff in that you might want to kind of turn back on it's a way of thinking about i don't want this line to run but i'm just keeping it here comments are usually for users to read but they're also a way for to tell the computer to ignore what we're doing so um okay so the problem here is this calculation that is not time and a half for overtime so um so the way we're going to solve this is with an if statement if the floating point hours fr is greater than 40 we should i'll say print overtime and then else colon the if and the else have to line up print regular so i'm not going to do anything different i'm just going to print the words regular and overtime okay and i just save that i'll do it so fast you watch the little blue dot happen and then go away that's because i hit command s to save it because i've done this way too many times so i'm gonna run it i'm gonna say uh oh wait hours is hour 10 hours 10 dollars an hour it's regular pay if i do it this way i can do 50 hours at 10 an hour that's not good what'd i do wrong f r is greater than 40. what did i do wrong it's still saying for 50 hours something's wrong here it's saying it's regular if f r is greater than o that's because i just like messed my program up so look i called this variable sh and this sr and then i get the floating point for the hours but look i'm checking up the rate fr that's the mistake so it's really i've got to look for fh that's a logic error it's a perfectly fine program python's perfectly happy with that and that was that was i was messing up i kept putting see i was typing the uh radian i thought so i was just typing it backwards i was crazy and again that can be the kind of mistake that you run into i mean i didn't do that on purpose i did it because i haven't drank enough coffee yet this morning hang on let me get some coffee i have more coffee i'm still going to make mistakes so now there's no little black blue dots so it should work better now python run it so 10 hours ten dollars is regular pay run it fifty hours ten dollars i did it wrong again i told you what i wanted to change and then i didn't change it fh i'm like crazy again you can you can be crazy too can you look right at it i looked right at it i did the wrong thing uh hooray it says overtime and now you notice i mean i'm not being silly here where i put this print statement in and it it it's so tempting for a programmer to just just immediately try to like bam finish it and and i don't know why i did it this way maybe because i know that i'm flawed and i know that i make dumb mistakes like that and what can i do so so i was just being really careful there now i know let's just be sure if it's under 40 hours it's going to be a regular computation and i'm going to leave these print statements right in here for a while because whatever so here's the interesting thing so i'm just going to indent this because the regular pay is to take the number of rate the hours times the rate okay and now i got to come up with a slightly different pay xp is equal to and so there's a couple ways to do this we can say time and a half can be calculated by saying um the number the the rate times the hours well let's do it this way let's call it regular the regular pay is the rate times the hours fr times fh and then the overtime pay is equal to the now we know that we have more than 40 hours because we couldn't do this in here so if i say the number of hours that you've worked fh minus 40.0 that is the number of hours above 40. so if this is 50 then the fh minus 40 is going to be 10 and then i'm going to multiply that times the rate fr except you get one and a half times right so your overtime pay is the excess hours times one and a half times the rate actually no i see i see i made a mistake already the extra overtime pay because you're going to get all your regular pay so the the 50 percent or the half is the extra and see i'm confused so i'm just going to print out reg comma otp and if those numbers are right i can do xp which is the pay is equal to reg plus otp see my program is always i have to make it a little smaller let's make so you can see the whole thing right so i think i got this right right i've figured out the regular hours which is giving you the base rate for all 50 hours or 45 hours or whatever figuring out the excess hours and this is the bonus amount there's lots of different ways that you can calculate this but in general we have an if statement on this side and this else is the easy one and that's 40 hours or less so if we're doing greater than 40. so i just hit command s to save it and so let me clear my screen with command k so i'm going to do 10 hours at 10 hours and that's a regular calculation and it's a hundred dollars and so if i do 50 hours at 10 an hour so my my regular pay was 50 times 10 and then the excess pay is five dollars an hour times the extra 10 hours which is 50 and so my pay is 550 and so it looks like i've got this calculation right and so here's here's the thing i've been printing these extra print statements in it's a real common thing for a programmer and you can look on github and see code that i write and i just leave these in because you know that could be broken but you're not supposed to print this extra stuff out and if you do this in the auto grader it's going to complain about these extra things it's going to consider you broken and again so i i comment those out to do one final test and notice the little blue dot so i have to save it and now i can run it one more time and i do 10 hours 10 dollars and it prints out exactly what i want to see and not the extra stuff because i commented them out and i do uh 50 hours and 10 hours 550. so i've got it right and at that point you should be able to go back and go to your autograder if this was in the autograder and paste the stuff in okay so i hope this uh exercise 3.1 was uh useful and thanks for watching hello and welcome to chapter four functions this is uh the fourth of our basic patterns we'll get to iterations next functions is the store and reuse one of the things in programming is that we never like to repeat ourselves we don't like to if we have four or five lines of code and we're going to do the same thing later we don't like to put the same four lines of code in um even if it has to do with reliability if you find something wrong with those four lines of code and you got them uh 12 different places in your program then you got to find all 12 places and fix them so like collect those to one place and then call them and reuse them and that's the idea of store and reuse so this is what how functions work inside of python the first thing we notice is there is a new keyword def that stands for define function and the def is like an if statement or we'll see fours and whiles that they end in a colon and then they have an indented block and then the indented block d indents and that's the end of the function and so so these these two statements make up this function um the key thing that you have to understand and get used to is this is this def part is actually not running any code whatsoever it's actually remembering the code and that's what i call the store phase the def creates a bit of code and records it like a macro although it's much more complex than a macro and it names it whatever you chose you gave it a name we named this one thing and so it as a side effect of python reading or parsing these three lines it doesn't do anything but it remembers these two lines are what you would like to run when you invoke thing so this is the definition of a function and this is the invoking of the function but so let's so this doesn't do anything so there's no output here from that stuff right there but then what happens is you invoke it and this thing looks like it's part of python but you an effective extended python with your def statement and so when it sees thing it goes up and runs your code and so out comes hello fun and then it comes back and goes to the next line does print so print comes out and then it goes back like oh this is the reuse part but we get to reuse it we define it once and we use it twice then it runs this code again and it goes to the next line and it's all done so this little bit came out twice and of course this is really simple so that i can fit it on a page but you get the idea that i don't want to repeat this might be you know 15 to 100 lines of code and i don't want to type those over and over again so i say hey store these in a name what i that i choose and then when i invoke them bring them back and then run them again okay so that's the basic idea we actually have already been using functions from the beginning the print is a function right print is a function every time we see print p r i n t parentheses and then we have some stuff in here we are calling the print function this is the syntax with two little parentheses is the syntax for functions and so input's a function type is a function float's a function in's a function all these things are built in functions that come with python at the moment that we uh we started i mean just we installed python and these came along and um and then there's other functions that we define and use and that's what the def is for and in effect we can create new reserved words of our own making that extend the python language after the after we define the function so it's just this bit of reusable code that takes some arguments we haven't seen any with arguments there's a little parenthesis and we'll see how that works in a bit we define using the def keyword and then we invoke it we there's the defining phase which actually doesn't run the code it just remembers the code and then there's the invoking phase you define it once and then invoke it one or more times calling the function or invoking the function we think of those two things as the same thing call invoke or just the terms we use most people just say call the function but invoking it is a perhaps more descriptive way to think about it so here's an example of a function is built into python it's called the max function and we can pass some parameters into the max function so we pass the hello world string now like much of python max knows it's what kind of thing is being passed into it and it knows that it's looking for the largest character the high this the um the lexographically largest character and in this case it scans this little that's inside the max code it scans through and finds the largest character so apparently lowercase letters are higher than uppercase letters because in english we get back a w and so this is what's called the return value so this is an assignment statement let me clear this and start over so this is an assignment statement so it has to evaluate this right hand side and a function call is nothing more than like x plus one it's something to evaluate it runs the function code passes in this argument and then this residual value this is called return value we'll look at this in more detail becomes the result of this little bit in the expression and there's nothing else we could have you know w plus one or something and then the w is what's stored into big okay so we print big and big is a variable that has this the letter w inside of it and then we ask what is the smallest and that finds the blank and so we get a blank to see this there's a min function and a max function both of these are built in these are built-in functions they're always there for us okay so here is another example of the max function and so we can think of this as invoking or calling this function as this right hand side is being evaluated we are passing this variable in and there's some code in here and it's going to do some stuff yada yada yada and then it's going to give us back a bit of stuff that's its return value and then that goes up into the big right and so that's that's how this works and so this is actually built in built in or burnt in i guess i can't draw and so you can think of this as some time a long time ago when python was being first formed somebody wrote some code and it's got some stuff in it it's got a little loop that reads through all the reads through all the letters it has to figure out if it's a string or list etc etc etc but this is store except you didn't do the storing because it's already built in and then this is the reuse store and reuse so we build these things into python they're already pre-built as if before the first line of your code executes way up here someone put all this code in for you into python and created a thing called max for you now we've been using this already built in functions we've got type conversions we've got like the float that takes a integer and returns a floating point version of that and again this is kind of like an expression so it's like i want to divide this by a hundred but before i do that i've got to convert it to a float so it has to sort of do these function calls as it's evaluating the expression okay sometimes like here we just have we just have a prints out the return value that's what this is this is the return value if you just type a function in a parameter uh it can be a constant or it can be a variable and as we'll see in a second we'll give you many of these if you like so you can either just run it or take the result of this this passes an integer in converts it to a float and then puts the float into that type tells us what kind of thing that is and you can use this inside of an expression and so it's like what am i going to do first oh i got to do 2 times this thing oh wait a sec pause just briefly for a moment to call out to some float code pass a 3 into it and then something comes back the return value the residual value comes back and then that participates in this case it's going to be 3.0 participates in this 2 times 3.0 okay and so 2 times 3.0 then being 6.0 etc etc but you can see as it it's like oh wait a sec i got to figure out what this is call the function get the return value and then continue processing this expression we've also done this with string conversions partly because just as an example the input always returns a string the input function returns a string and so you know here's this string could be coming from input but we'll just take one two three we know that that's a string it's not the number 123. and if we try to add one to it we get a trace back cannot concatenate string and integer trace back but we can convert that string to an integer and so int can take like a floating point number or an integer or even a string and it says oh i know what i'm supposed to do with the string i'm supposed to look at this interpret these as numbers and you know multiply by 10 and figure out what the hundreds place is and all that stuff there's a little bit work to that and it does it but then it gives us back an integer and we say oh what is that that's now the 123 but it isn't of type int and now i can add one to it and get 124. and as before from this example that we're kind of reusing from a previous chapter you don't want to try to convert oops sad face sad face sad face don't want to try to convert something that doesn't have digits using end because it'll say i don't know what to do and then your program quits right you don't want your program to stop tracebacks and you can of course deal with that with try and accept but that's like a previous lecture okay so up next we're going to talk about building our own functions not just using the predefined ones [Music] so so welcome back we're going to continue and start talking about building our own functions so again we use the def keyword to define a function and then later we're going to invoke this and there's a bit to it we are defining the name of the function in effect we're extending python and creating new predefined things that we can use except it's our code it starts with a def keyword has some optional arguments which we'll see in a bit that's what the parentheses and then the name and the function names file the same rules as variable names and then you have an indented block whatever code you want to do and then you have a de-indented block and that sort of defines the essence the key thing here is this is not calling it's not invoking it's not executing it's remembering it's storing it's figuring things out so here is the output of a program that defines a function but then doesn't use it so this is a sort of broken function so here we go we start x equals five print you don't have to definitely have all the defs at the beginning the def runs whenever so you know out comes hello and then we define a function and this says oh oh you want to make a new thing here so i'll make a new thing it's kind of like a variable in a sense and then it copies this stuff copies it up there and says later you probably are going to want to use this so i'm going to remember it so it doesn't do anything there it no output comes out then it says print yo and out comes yo and then it adds 2 to x so x is now 7 and then it prints x and there's no 7. there's 7. these print statements never ran they never ran why because we did not invoke them down here we didn't we defined them but didn't invoke them so let's take a look at how you invoke a function right you define it and then you use it sometimes you define it once and use it once but more commonly you define it once and use more than one time again the store and reuse pattern the def is the store and the invoking is the reuse so here's just a slightly different version of that last program and so now it's going to actually invoke it so x equals 5 print hello def so out comes hello this produces the def produces no output right but because there's a d indent here that is the entire blob of the of the code that is part of print lyrics so it prints out yo and now we're going to invoke this is the call we're going to call the function now the function goes up let's clear this so we're down to here now that this this this like suspends at this place it's like remember to come back to here when we're done go up run this code and then come back and then continue on so it like leaves like a breadcrumb of where it's supposed to come back to and then it runs and then the print lyrics of course produces the two lines of output and um yeah that should probably not have that day should be up there and then x equals x plus two which makes it seven and then prints out seven okay so this is the invoking invoke or call the function okay you defined it and then later you called it now in addition to just call and return and invoking we can pass parameters in and the example of the parameter is in the max function we have to say this is the thing i want you to find the maximum about the largest thing and and part of it is in the whole store and reuse pattern we have a few lines of code but sometimes we want to do ever so slightly different things in the different invocations and so we use the arguments to subtly adjust like finding the maximum is a general thing but what thing to find the maximum of that makes a function that's much more useful and reusable in a lot more situations so arguments are the thing we passed in and we define for functions that we're going to build we on the def statement so we say def greet name a function and then this is the arguments the things that are coming in now this lang variable in a sense only exists during the life of the function and it represents sort of a placeholder it's not a real variable in the same sense it's a placeholder that refers to how you touch that first parameter that's sitting in there okay and so lang so lang is our first parameter whatever it is we don't we don't need to see this part down here right now all we know is we're going to make a function and we're going to take a first we're going to take a parameter and this lang is the placeholder that tells us what that parameter is okay so within the function we're going to check to see if the language is spanish if we are print hello else if the language is front print print french print bonjour otherwise print hello we have a very highly simplified language translation system here so the def of course does nothing except it remembers that and defines the concept greet so that comes down and now we're going to call it that says go look up the thing that i define called greet if you don't put this in greed is going to give you a traceback but because you extended and named it greet so it runs in it starts suspends the code here starts up here but then lang is now an alias to en so now we can run if that is a yes oh else if oop i'm getting it all wrong now right so ian comes in as lang we're coming in the code if it's it's not yes it's not fr else it prints hello and then it comes back to the next line and then we call it again and this time es is laying and so it runs this code and prints hola and then next time it calls with this and then prints um bonjour you get the idea so this is a placeholder to so that on the successive calls or invokes invocating invocation of the function we can get at whatever the programmer put in as that first parameter and so we are saying in this definition we are ready to receive a first parameter please call us with a parameter and then we will be able to do something slightly different for the different values so this is a reusable bit of function that prints hello in three different languages and then we tell it what language at the moment that we're actually invoking it so that's putting stuff into the function now getting stuff back out is the concept of returning in the return statement the return statement is an executable statement that does two basic things the first thing that it does is it finishes now this is a one-line function so that's kind of redundant but it if when python goes into the return statement it doesn't continue on to the next line it just returns that is the end of the invocation of that particular function but even more importantly it takes as its parameter you can say return without a parameter and it will stop the execution of the function kind of like a break does for a loop it's kind of a break for a loop get out we're done don't run that next line get out but it also allows the specification of what you want as the residual value in an expression so we're doing a print and then we're saying greet and and what's going to show up here is whatever this function does in its return statement and so that prints hello we call it again and prints hello again okay and so and so basically the return statement is the i call this the residual value it's like what shows up here when the function is all done and it's the string hello we call the functions that return values fruitful because they produce something and but you don't have to you can just say return or you don't even have to have a return statement it goes to the last line of the function and it does return automatically at the last line of the function so here's a little bit of a rewrite of our little language program we are going to create a greeting program we're going to take the language as the first parameter and instead of just doing a print statement which is what we did before this is now more more like a function because it takes some input and produces some output as a return rather than just printing it's a little tacky for a function to print and so here we return hola bonjour and hello based on the right thing so now we say print greet en so it runs the code once lang is en and then it runs this code and the residual value is hello so it says hello glenn and similarly when it runs this code it's passes esn is laying it runs through and it runs this statement it does if there was more statements it still it wouldn't run them as soon as this return runs that says that this bit right here is is now hola so a los sally and the same with french goes in runs again out comes the return statement and then bonjour michael so you see how we can control as we're writing the application we can control as we're writing the function what the residual value that we want to see in whatever expression is calling us sometimes we have returns and sometimes we don't have returns so so if you think of the max code that we talked about before we can kind of see that somewhere inside that max code there's a return and that's how it communicates the w back to us so we pass in his argument hello world it comes in as a parameter and it's going to loop through this imp somewhere it's going to loop over and over into imp and then at some point it's going to figure something out and tell us what it wants to send back to us is a return statement and so the w comes back and gets assigned into big you can have more than one parameter and they're just an order the first one and the second one three and five so three becomes a and five becomes b and away we go so we just use this to add two numbers and so three plus five is eight so you get as many as you like and the order matters and and if you do things like you tell it you want a parameters and you don't give it to them then that'll become a traceback and it'll blow up you can also talk about optional parameters later so you don't have to have return values and that means that you simply don't call a return with a value and return is always implicitly happening as the last line of the function so that's that's kind of the basics of how functions operate but i don't want you to get too excited about writing functions some programming classes are like gotta write a function gotta write a function functions to be clear are a very powerful mechanism and as we write programs 150 200 000 200 lines of code thousand lines of code 10 000 lines of code the concept of a function is really important we would go crazy if we didn't have functions but if you're only writing 20 lines of code forcing yourself to write a function is kind of pointless so don't worry about the maybe the lack of urge to use this we are calling lots of predefined functions and we will for the next couple of lectures there will be a time when you go like oh i'm sick and tired of repeating myself oh yeah time to write a function so that's that's why we don't push functions prematurely we just want you to know what they are use them and at some moment you be like oh i want to define one but don't worry about it might take a while before you really want to define a function so that kind of summarizes our lecture on functions and up next we're going to do iterations [Music] so hello and welcome to python for everybody this is another of our worked examples and i'm your instructor charles severance so the example we're going to do right now is exercise 4.6 in chapter 4. this is an exercise where we're kind of taking code that we've already written and redoing it in a way to just prove that we can do it with a function so it's not like it's going to do anything different it's going to do exactly the same as before and we're just going to do it a different way so let's go ahead and start up our text editor atom show you a couple new features of this it's already been open let me sort of open it here again i'm going to open get a new window and then i'm going to open but what i'm going to do is to open this folder right so it's this folder right here and i'm going to open that so instead of opening a file which is what i've been doing so far i'm going to open a folder so now you can see all the folders that i have and they're just exactly the same folders that i've been making under desktop py4e and so i can go and look at my previous assignments the other thing i can do is i can control click on this and say new folder and i can say ex exercise o4 o6 again i like the o4 and the o6 just so things line up when we get to chapters 10 and 11 that's why i'm naming my files this way and i'm going to adapt exercise the code from exercise 3.1 we could do exercise 3.2 but it's longer the difference between 3.1 and 3.2 was one uses tri accept and the other one does not so i'm going to try to start with this one i always clean up and get rid of that so here is our time and a half for overtime logic and i am going to say file save as so i'm making a copy of it and i am going to put this in this folder exercise 4 6 and i'm going to make sure to rename my file 0406. okay so now i've got exercise oh 406 which is this one right here that's the file we've got and it's there and so let me pop up my terminal window cd desktop cd py4e cdex 04 hit the tab for o6 because there was only one in o4 and so i can see where i'm at and i have one file here ls minus l gives extended information and i can say python3 ex04 py 10 10 and away we go okay so this is this code i'm gonna get rid of some of this logic here i just these print statements which i commented out just to make it a little more dense okay so let me start by just putting a function in here and of course def is the key word for function and compute pay is the name of the function and it's supposed to take two parameters hours and rate and then you add a colon and then we're indented right so the the indent is to determine how long the function it lasts how many lines it is and i'm just going to say print in compute pay give myself a blank line i'm going to save that now you will i'm going to run it now i'm going to run it now and you'll notice something real quick this line never came out and that's because the way this works is this simply defines the function and then it continues running but it doesn't run that code and it ran this code and then it did in this case the else code and then it did the print but it never came up and actually ran this code we have to call the function okay we have to call the function before it's going to actually run so now let's make a call compute pay and let's pass in our variables that we have in this main code fh and fr so that compute pay fh and fr and then i'm going to print out hours and rate so now you'll see it's going to do the input the input convert to floating call the function which is going to run this one line and then continue down here right so now what it's going to do is going to define it run this run this call compute pay which is going to temporarily suspend come in here run this code and fh is going to map to hours and fr is going to map to rate and then it's going to come back and it's going to continue down here running this code and then finishing okay so that's that's what we're what's going to run so this time we will see compute pay and we will see the number for hours and rate so i'll run it again 40 hours 10 dollars an hour ah we see a mistake i'm like and then you're going to see the word compute pay here and i'm like okay that didn't happen well what's going wrong let's take a look at what's going wrong i don't plan to make these mistakes i just make them let me show you what's wrong here see this little blue dot that's what's wrong with my code and if you don't catch that then you'll say oh i'm crazy well i'm not crazy i just made a mistake i forgot to save it file save or command s so now i saved it i didn't change a single line of code it was right i forgot to save it someone run it again now 40 hours 10 and look now i see this line so it came into compute pay and then came back and then finished this part and did compute pay okay now this isn't enough what i need to do is i need to move this code the actual pay computation so i'm going to cut that and i'm going to put it in here but now this variable fh belongs to the main program it doesn't belong to this code now i don't want to use that so i want i want to say hours so i'm taking whatever hours are being passed in and then using rate and i'm just kind of changing all of these to match the parameters inside the function this is hours and this is rate and this is hours and this is rate now i'm going to print a print statement inside i'm going to say returning and then i'm going to print xp and then still inside the function i want to return a value from this function so i'm saying the returned value for this function is actually in the current function variable xp and then i'm going to call actually i'm going to not call this xp i'm going to call this pay in here oops not with capital i don't want to be capitalized pay and pay and pay so it's it's returning pay and then in here i'm going to say xp equals compute pay so what we've achieved here is let me get rid of that what we've achieved here is it's going to come down to find a function then read the input convert it to floating point and then call compute pay passing in fh as hours and fr is rate right and then it's going to take hours and rate do whatever it does with the if then else it's going to print these things then it's going to return it and then whatever this pay variable is effectively goes back into this expression as the return value and again gets assigned into xp okay so that's basically what's going to happen and then we'll print it'll come back and then print xp okay so let's go ahead and run oh wait i almost did it again look at the little blue dot you guys can watch the blue dot and when i forget to save it and when i then run it it'll blow up so ten dollars an hour oops ten hours fifteen dollars an hour so now it worked so you can see these two print statements come from while this is running the input happens it calls the function and then it prints out the pay and now to finish this to get it right i will simply comment out my little helpful debugging print statements and run it one more time 40 hours 10 dollars an hour and run it again and have it be 50 an hour in 50 hours and 10 dollars an hour so 500 550. so that one is now working so that's that one is completed and we have moved our computation inside of the function so i hope that you find this useful i hope that you find this entire course useful and uh and thanks for watching hello and welcome to chapter five loops and iteration now we're going to work on our fourth basic pattern of sequential conditional uh store and reuse and loops and iteration and this is the one where we teach the computer how to do things a lot we can tell to do something a million times and so that's where um we get the doggedness of computers or the fact that they're so good at doing work for us because we can set them off to a task and they'll do it until it's done so here's a very simple loop uh a very simple loop let's put the coffee over here um the keyword that we're the that we're going to start using is the while loop we're also going to use the four later on um and the while loop is functions very much like an if statement uh the while starts it and then this is just like an if statement it's a question that leads to a true or a false answer and then there's a colon and then there's an indented block and then we use the d indent to determine how long the loop is and so this print is de-indented so that indicates the end of the loop and so at some level what's going on what's going to happen here is it's just going to run and if this is true it's going to run this code and if it's false it's going to skip the code in that way it functions like an if the place that it doesn't function like an if is after it's run the code once it goes up and then asks the question again and so you can think of it going back up kind of to the top of the while loop and then re-asking the question like okay is this going to run again and then it's going to do that some number of times and then it's going to finish and so that's the loop that's the iteration and we're going to make a variable we're going to construct very carefully a variable that we call the iteration variable and that's n and it's a variable that's going to change and it's our way of running the loop but not running the loop forever so uh let's just run this we come in and it's five is n greater than zero yes it is so we're going to run this code so we're going to run this code we're going to print out five then we're going to subtract one and then we're going to go back up go back up and ask the question is n greater than 0 and the answer is since it's 4 the answer is yes so when it runs again then it prints out 4 subtracts it again checks prints 3 subtracts it again prints two subtracts it again prints one subtracts it again now n is zero and so it comes back up comes back up is this question has now become false so it's going to take the exit so it's going to come down and run this line right here then it prints blast off and we can kind of print out the residual value of n just to sort of prove to ourselves that it ran until n was no longer greater than 0 and then 0 was the final value for n and we carefully constructed this n n equals oops go back we carefully constructed n we set it to five then we carefully subtracted one each time through the loop and then we're using that to control when to exit the loop and so you could think of this loop as for now running uh five times true true true true true and then false finally so this question was true for a while and as long as it was true the loop ran and then when it finished when it turned false the loop stopped and so this variable that we construct to control the loop was called the iteration variable because it tells how many times this loop is going to run over and over or otherwise known as iterate so this is a badly constructed loop with an iteration variable that we didn't do very well and so if we take a look at this we start out with n5 and then this is greater than zero so it's true so it runs it and then it runs it again and then is still greater than zero so you can pretty much see because we're not changing n this is going to be true true true true dot dot dot forever true forever and so this is an infinite loop and uh it's just going to run until your computer runs out of battery or you hit the button this is the kind of thing where you often see your your computer spinning like a spinning beach ball or some other indication that your computer's super busy it's in some kind of a loop really tight and it's running something and it's using up all of the processing resources of your computer that's an infinite loop and so the problem is we did nothing with the iteration variable here's a different loop and so this one demonstrates a different idea so in this case we start out with n is zero and it comes in here and is n greater than zero question mark and the answer is false so it skips it it doesn't run these lines of code at all and so this loop doesn't run at all because it comes in asks the question it says no and then it skips right around it so never run never run and so this actually is sometimes you write a while loop on purpose like this not quite as simple as this one but the idea is is this is this emphasizes that these loops are what we call zero trip they are not even guaranteed to one run run once they're they're going to run maybe zero times and in this respect it functions exactly like an if statement right meaning the first time through the loop if it's not true it's just going to skip right by it so there's a couple of ways of getting out of loops in this case i'm constructing an infinite loop because remember the kind of definition of an infinite loop is if this is going to stay true well true is the constant true so this is going to run forever and what it's going to do is it's going to prompt with a little uh little arrow and then let us type and read whatever we type into the variable line and then if the line is done we're going to break now break is an executable statement and if you hit the break it exits the innermost loop out to the to the place beyond the the end of the loop so when this runs the first time and we say hello there line is not done so it prints it so it prints out hello there and then goes up and then we type in again we type finished and so it doesn't it's not done so it prints it so now comes that print statement then we type in done and now this becomes true and it comes out and runs the code beyond the end of the loop the key is it doesn't go back it's like once you've done a break that loop is done and so you so you look at basically you know the block that is the loop so here's kind of the loop block and then the break goes to the line after the end of the loop block and you can think of this as sort of like just a hyperspace jump there is nothing really this could be literally hundreds of lines with if statements and you could be running and doing all kinds of stuff and running on doing all these things you know and these things could run all kinds of ways right the point is is as soon as you hit a break statement however much stuff is down here however much stuff is up here it exits to whatever the next line is beyond the end of the loop continue is another loop control statement but it works differently than break so break says get out of this loop um continue effectively says uh stop this iteration we're done with this iteration and so continue says go up back to the top of the loop oops yeah go up back to the top of the loop and so here we read a line if the first character is a pound sign line sub zero if that first character is pound sign we're going to skip it and this is a way for us to make like little comments in our typing and then we print if the line is done we get out and otherwise we print it and so that's why there is no printout here because it comes in runs oops it comes in uh hit this is true and that goes back up but it comes back and prints out the next one and does another thing and so the loop continues whereas the break ends the loop and so again the same kind of notion that you're sort of doing all kinds of complexity wherever you're at in this loop you hit continue and it does not it doesn't go any further it goes back up and runs the question mark it asks the question mark and and so i mean ask the question and it might exit the loop in that particular case but this one here is a true this is an infinite loop that i've constructed this is not an infinite loop because at some point the break gets us out of the loop and so it's an infinite loop with break to escape it and that's another common way to construct a loop so these loops that we've been drawing so far the ones that use while as their key key keyword are what are called indefinite loops and that's because they kind of go for a while till a break hits or until some value becomes true i mean until that as long as that value remains true so when we it all the ones we've done so far easy to look at and know that they look pretty good and they're probably going to finish but there are sometimes if they're long and complex and and they're exeter termination conditions are a little more complex we're not it's not clear that they're really going to terminate and so we we can use while loops for a lot of things but um for most of our looping we're going to use what are called definite loops and that's we're going to talk about next [Music] so definite loops use the for keyword and the idea of a definite loop is it's going to loop through some set of things it might be a set of lines in a file it might be a set of characters in a string it might be a set of strings and a list of strings but whatever it is it's sort of going to run a finite number of times depending on the thing that it's looping through and we like this and it it's an easier way to construct it and we actually don't have to deal with the iteration variable the for loop includes a mechanism to construct the iteration variable for us so it's definite loops iterate through the members of a set so here's a very simple uh for loop um and so you see the four keyword and n is also a keyword and the iteration variable is something we put right here this i is declared this i is like an assignment statement and i is going to take on successive values so i is going to be 5 the first time through the loop then i is going to be 4 the second time through the loop third two one so i is going to be assigned five different times to five different values and then the loop is going to run it's going to run once with five once with four once with three once with two and once with one and so this block of code we have contracted say execute it five times with these values of i i is that iteration variable i is the thing changing through each iteration of the loop okay and so that's why this prints out five four three two one and then when it's done it finishes it so this is a much more direct syntax for looping five times and setting an iteration variable you kind of all combine it into this one thing right all into one thing so it's quite nice so you don't have to be going through a list of numbers there's all kinds of things that we can iterate through with four and by the way while i'm sitting here don't i named my variable friends because that's a list of strings and friend which is the iteration variable i'm using singular and plural because it helps you read it python doesn't understand singular and plural so just because you say friends doesn't mean python knows it's a list python does know it's a list but it doesn't know by the name of the variable i've chosen that's your basic mnemonic variable warning these are cool variable names but i don't want you to get confused by them so you can loop through a variable so we're going to take this list of three strings and stick it in friends and so friend is going to iterate through that so the first time through friend is going to be joseph second time through it's going to be glenn third time through it's going to be sally and so that just says run this loop run this code the indented code three times each time the variable friend takes on a successive version of a successive value that's in the friends array so it says happy birthday joseph glenn sally and then we come out of the loop and we print done so if we try to draw a picture of what this is really doing the for loop is actually doing a whole bunch of stuff that we would have to do with maybe separate statements in the while loop um first it decides how many times to run the loop so it's answering the done question which way do we go and it is also then moving i ahead it's managing the iteration variable if you go back to the it's initializing it too if you go back to the while loop we had n equals 0 while n greater than 0 n equals n minus 1. so we had like three lines to control the loop to manage the iteration variable but with a for loop we don't have to do that and so that's all taken care of and so that basically says you know the for loop by you using a for loop are we done no we have five things to work well set out of the first one run it we're not done because we've got one more set it to the second one third one fourth one fifth one and now we're done and that is all handled in a single line of code and that includes the iteration variable and the set of things through which we're going to iterate through i really like the word in it is mathematically i mean it reminds me of uh the set theory where you say this is a member of this set or the for each and math isn't important here but if you do know math the vertical bar means such that right as a member of this set and those that kind of stuff member of the set um i'll erase the math stuff so we don't over math but it's like for each of the values in the set 54321 run this loop setting the iteration variable i to the members of that set so n reminds me for those of us who are math oriented n reminds me of a really nice concept in mathematics okay now you could think of this as sort of this looping structure where the for loop and this is pretty much how it actually runs inside the computer right where it initializes it i runs this runs this thing five times and then executes that's one way to think about it but you could also think about it in a about it in a somewhat more abstract way and think of it as all we're really doing is we have a contract with python that says i we're supposed to run this code five times and i supposed to be five four three two and one so you could imagine this might be what's going on the for loop sets i to 5 runs our code the for loop sets i to 4 runs our code the for loop sets i to 3 runs our code the for loop sets i to 2 runs our code for its i to 1 and runs our code all we know is our code was run 5 ran five times and by contract each successive time we'll get a different value for i and that value for i is taken from this set and so this is just one way to think about it to say to yourself oh yeah this is one way to think about it as it's actually and this is how it really works but this is also kind of logically the contract that python is making for us so up next we're going to talk about taking this notion of doing something to a lot of items but accomplishing something with that and i call these loop idioms [Music] so now we're going to talk about loop idioms and loop idioms are patterns that have to do with how we construct loops we have the mechanics of fours and whiles but ultimately we want to get something done we want to solve a problem with a loop and often what we have to do is uh if we have a set of things whether it's lines or strings or characters or numbers we're looking for something like the largest or the smallest or we want to add them up or something like that and so we can't just say add them up we have to say go through each one and do something to each one and somehow achieve adding them up and the pattern that we're going to follow is we're going to have this loop that's going to do all one run once for each thing right in some chunk of data and then but we're going to set something at the beginning and then we're going to do something to each one and at the end we're going to kind of get the payoff we're going to get the result so if we're doing sort of summing things we're going to have a running total and so this will be like t equals zero and then this will be t equals t plus the the thing value and then but this is not the real total it's the running total during the loop but at the end it is the real total and so we're gonna we're gonna look at what you do at the before the loop starts during the loop and then what you get after the loop and how you can use that so we're gonna use this loop it's just gonna loop through a set of six numbers over and over and over again right so we're going to do something before the loop we're going to do something after the loop and then we're going to run loop some number of times and in this case thing is our iteration variable because i'm using unnemonic variables now so it's going to run you know 9 41 12 3 74 and 15. so it's going to run and print these things out so it runs this loop six times and away we go now this loop does nothing except print stuff out of course i'd like to do that first is always print things out to make sure that sort of my brain is functioning so to kind of understand how these loops work i'm going to ask you to function as a program and i'm going to show you some numbers in succession and i want you to mentally figure out what the largest number is but more importantly think about how your brain is solving this problem of what is the largest number given that i'm only going to show them to you one at a time for a little while and your brain has to do something and imagine i was going to show you thousands of numbers i'm not but imagine that was how would you organize yourself in a way so that for like an hour and a half you could sit here as i showed you numbers and you keep track of the largest number that you've seen of all the numbers okay so here we go here's your first number second number third number fourth number fifth number sixth and last number what was the largest number um what was it well it wasn't too hard it was 74 but that's not the question how did your brain arrive at 74. so here's all the numbers if i'd shown you all the numbers and asked you um what's the largest number your eyes would have sort of gone and then you got to 74. and and you wouldn't do it in any particular order your eyes would just like see the 74 and it would just throw smaller numbers away and it would move really quickly to what the answer is even if there was several hundred numbers on the screen your mind would sort of move fluidly wherever it felt like moving and then arrive at it and probably what it would do is it would do something like you know kind of move like this find this and then sort of check to make sure that it's okay then say like okay i got 74 i'm done that's not how computers do it that is not how computers do it they do not move fluidly but they are highly dedicated they're going to do something [Music] 74 but how would you construct a loop to achieve this so let's take a look you could create a variable called largest so far and this is the largest variable the value that you've seen in the list so far now i haven't shown you any numbers yet so we'll just set this to negative one to get us started so now we see three and we're like oh that's better than negative one it's our first number so it's probably the largest we've seen so far right great 41. oh that's bigger than the largest we've seen so far so we'll keep it 12 is not bigger than 41 so we're not going to keep it notice this keeping thing 9 is not bigger than 41 so there's no point to keeping it 74 is bigger than 41 so we'll keep it is this the largest number we don't know we don't know until we're done 15 not better than 74. so now we're all done and hooray hooray hooray we have the largest number and we had this variable that we kept the largest number that we'd seen up to this point and then when we know that we're done at the end then that becomes the largest so if you look at all the numbers keeping track of the largest so far at the end of all the numbers the largest so far and the largest are the same thing and so that's how you get this idea of something you're doing doing during the loop is not really the answer but by the time the loop is done you will have the answer and so here's a bit of code that does this use it with our numbers right so let's take a look so i have this variable called largest so far i set it to negative one before the loop remember there's a loop before and a loop after and loop in the middle before it's negative one so now the num remember underscores are okay that's my iteration variable if 9 is greater than largest so far well largest so far is negative 1. so that's true so this code's going to run so we're going to remember the new number so this is nine and so nine ends up in largest so far and then we print it out so largest so far is nine after we saw the number nine then we do it again so now 41 comes in and is 41 greater than 9 the answer is yes it is so we're going to run this code copy 41 into 9 41 into largest so far and then print it out and largest so far is 41 after we saw the number 41. now we're going to run the loop again with 12 okay and you get the idea i hope is 12 greater than 41 which is the largest we've seen so far and the answer is no it is not so we skip so the largest so far stays 41 even though we saw 12. meaning we're sort of like ratcheting up but we never ratchet back down so we run it again with 3 and 41 and we skip this and then the largest so far is 41 even though we just saw three and now we see 74 is 74 greater than 41. see we never are looking at all the numbers we're only looking at the window on the numbers of the current number that we're looking at so is 74 greater than 41 the answer is yes so we run this code and then we capture the 74. so we've seen we just saw 74 and it is the largest so far and then we run it again with 15 but 74 is our largest so far and so it skips so 74 remains largest so far after 15 and now we're finished because we just ran the last thing the for loop takes care of everything and jumps to this print statement and says afterwards largest so far is 74 but at this point it's also the largest right so largest so far became largest when our loop finished so that sort of gives you this notion of how we construct you know something at the beginning some kind of thing that we're going to do over and over and over again and then something at the end and we put some print statements in just so we can watch it and see what's going on so coming up next we're going to talk about uh some more loop patterns some counting totaling averaging and finding the smallest number [Music] so now we're going to look at some more patterns of the different things we can do at the top of the loop in the middle of the loop and at the bottom of the loop and the first one we're going to do is counting now we're going to take a look at the number of something the number of things in our list now we could just inspect it in c6 but you will have for loops like you're reading through a file or you know scanning through some data and so the notion of counting you have to assume that you don't really know you know dot dot dot dot that there's going to be a lot more than just six but for now we're just going to do six and we're going to count how many things that we see in this loop and the pattern is simple you set a variable zork to zero at the beginning we often call this variable count in mnemonic and now we're going to run this loop six times one two three four five six and each time through we're just going to add one to zork so work start at zero then it goes one two three four five six and we're going to print it out so you know we see the nine and zork is one see forty one's orcas two none in it and zerk is sixteen when we see the fifteen four stops and we print out afterwards and this then is six is then the ultimate count that we got so that's very very simple the pattern is that set it to zero at the beginning add one to it and if you run that enough times then this is how many that you how many times that happened and in a sense it's how many times this line ran right sometimes you put this in an if statement etc etc etc okay oops now we can do the same thing to get a total the way the total works is you compute a running total of the number of the items that you've seen so far and at the end the running total in effect becomes the total will you a better variable name for this would be like sum or total or something but zork i'll use zork again so you set zork to zero and it starts out the total we've seen so far is indeed zero and then we're going to run this one two three four five six times and thing is going to be the iteration variable it's going to take on the successive values and each time through we're just going to take our running total and add to it the thing we've seen so we see 9 and the running total is 9. we see 41 and then running total becomes 50. we see 12 the running total becomes 62. we get a 3 it becomes 65 we get 74 we running total is 139. how many more how many more are we going to see we don't know could be a million could be one oh it's only one we get a 15 our running total is 154 and what's true at any moment here is the running total is right of what we've seen so far now when we're done the for loop quits for us and afterwards 154 is indeed the total so the running total while we're in the loop at the end of the loop after the end of the loop we have the actual total so it's not very difficult to convert this to the average because we calculated the count and we've calculated the running total and now we're going to have the average by simply dividing those okay so now this time i've used mnemonic variables don't get confused by this demonic variables are just friendly names i chose for you to read the code easier i am not communicating to python in any way by naming this count and sum but count is sum is nice okay so i set count to zero and sum to zero go back up i set count to zero and sum to zero at the beginning and the count is zero and the sum is zero and then i'm going to run this loop six times one two three four five six and each time value is the is the iteration variable i count every time i run the loop i count equals count plus one sum equals sum plus value so i have a running count and a running total and they show up here one two three four five six and then the running total and then at some point the for loop you know we do the last one and the four loop jumps out and it divides 654 is the count and running total and then it divides the average sum over count okay so that's just again a pattern of something in the beginning something in the middle something in the end another kind of thing we tend to do in loops is we look for things we hunt for things and so this is where we have an if statement inside of a loop and of course i've created a silly simple thing uh in this code i am looking for uh large values that are values that are greater than 20. and again don't think of this as just six numbers but i'm looking for all the values and i'm going to print them out so you know it says before it's going to run this well if 9 is greater than 20 it's false so it goes back up 41 true so it prints out 41 then goes back up 12 falls goes back up 3 false goes back up 74. true so it runs this so comes that little print statement goes back up and then 15 is the last one and that's false it goes back up and the four says we're done and then we do afterwards and so this is just the notion of having an if statement in side of a for loop where we're sort of picking or choosing or selecting or looking for something in a large set of things that we're looping through we can also say i want to know if a particular value is there and so we're going to use a boolean variable we've talked about integer variables like 1 42 and then floating point variables like 98.6 and then string variables like hello world that have quotes in them this is a fourth type type a kind of variable it's called a boolean variable and it only has two values it has true and false matter of fact these if statements they return boolean values value equal equal three that is returning a true or a false based on the value of value there's a mnemonic confusion there right but i'm using so um i'm going to make a variable called found and that's a decent name for a variable so don't get hung up on that and i'm going to initially say found is going to indicate to me whether or not i found a 3 in my list and i'm going to start before the loop starts it say false because we haven't found anything yet so found equals false and so at the beginning of the loop found is fall before the loop starts found is false and now we're going to run this loop a bunch of times 9 is that true no skip 41 is that true skip 12 skip right so 9 41 12 and found has remained false because we haven't done anything to it but now in comes a three and this becomes true so it runs this code so found becomes true and then we print it and you'll notice that when we see a three we get true and then it runs again we get 74 it's still false 15 still false run run run quit and the residual afterwards is true and in fact if you didn't know any of this and you don't print that out all you know is that afterwards we loop through all those things and we know that there was a three in there that's what we're doing so we searched all of them we checked for threes when we found a three and you can see basically that you know the found remains false until it flips to true but then there's nothing to set it back to false there's nothing in this loop that's going to set it back to false so once it sort of catches the three then it remains true for the rest of the loop and then it just finds its way out now if you want to think about it for a moment ask yourself how might we make this loop more efficient by putting a statement right in here think think about a way to once you've found it and it's true there is sort of no reason to keep on going so what would you put there to perhaps make this loop to look for threes just to tell you whether or not there was at least one three in there how to make that more efficient just think about that okay so now let's look back at the largest value that we started out with right and so if you if you think about this let's kind of give it a uh sort of a rough rough look here largest so far is our kind of like a running total but it's our hypothesis is the best large number and we have this if statement that says if the number we just see right now is greater than the largest so far then capture it right take whatever number we saw and capture it so when we see uh nine it's better we capture we see a 41 it's better we capture it we don't capture this we don't capture this we capture the 74 and we don't capture the 15 and that's how we do it so you could think of this as better when some when the number we're looking at is greater than our working hypothesis of the largest we grab it because it's better so this this line right here is the grab line grab it okay so then the question is how would you modify this code to teach it to find the smallest value in this list of numbers think of it as you have a starting number you have a sort of what's better in this grabbing notion how could you do that take a look [Music] okay so let's take a look so let's let's do a couple things like the the if you look at this if statement that's better well it's better now if the number is less than so if the if then but then we should probably change this to be smallest so far smallest so far smallest so far smallest so far smallest so far smallest so far right matter of fact that's what this is we've changed the word largest so far too smallest so far and we've changed the greater than to a less than is that going to fix it i'll give you a second to look at it pause if you need it's going to fix it second to find our smallest number the answer is of course no it's not so if we run this code so we set the smallest so far to negative one and starts out negative one we run it and it's nine is nine less than negative one no it's not so after we see a nine the small so far is negative one now we're going to run 41 is 41 less than negative one no it is not so small so far still negative one as a matter of fact it isn't the smallest so far anymore just because we named it smallest so far doesn't mean it is the smallest so far it didn't work out so well and so you see that none of these because they're never less than negative one do anything and we claim that afterwards the smallest we've seen so far is negative one and that is because of course negative one is smaller than any of the numbers that we saw so how could we fix this well if we started the small so far with some like arbitrary big number then it'd be better so if we made this a hundred whoops come back if we made this be like a hundred that'd be good because the first time through the nine would be less than a hundred so we would capture the nine and then the rest of the loop would work just fine but then what if we didn't know that how big these numbers were as a matter of fact the largest so far wouldn't have worked if all the numbers were negative think about that we just assumed they were positive and so we kind of wrote lazy code that assumed all numbers are positive that might not be a good assumption depending on the numbers that you're dealing with right so maybe a hundred's a good number to start with or maybe like a thousand or ten thousand or like some number with lots of zeros in it how big should we make this and the answer is we're kind of solving this problem the wrong way and the thing we really want to do to solve the problem is to just accept the fact that if we're looking for the for the smallest number so far that the right hypothesis is the first number and if we just knew what that first number was the nine that would either that would because it's the first number we know that it's the both the largest so far and the smallest so far as soon as you see the first number but we don't know here before the loop starts what that first number is i mean you can look at it but assume this is just data that's coming from somewhere else and we don't know it until we start reading it so we have to construct a loop that deals with the fact that we want to capture the first value as our hypothesis for smallest so far so how do we do that let's take a look so what we do is we use yet another type so we have integer floating point string boolean and now we have a thing called the none type nun type is a special marker in that it only has one value boolean has true and false you know floating point has an infinite number of values and integer has an infinite number of values but none type has one value none none is a constant capital none is a constant the difference is as we can check to see if we have stored none none is often used to indicate emptiness not non-existence because because smallest doesn't exist until we assign it but we're going to assign it to like a mark a flag a marker some way to say oh this is not even a number it's nothing and so we're going to win you can do this so that's like makes a variable called smallest and then it puts none it sticks it right it's not a string none it's like a special type okay so that actually captures the notion that before the loop starts the smallest number that we've seen so far is none we haven't seen any numbers okay so then we come in and we have an if statement and we have a new operator called is is is stronger than equal sign and so if smallest is none that becomes true it runs this case and so then what it does is it copies this first value which is nine into smallest and so we see a nine and a small so far is nine which is the first value and again we're assuming we don't know what the first value is before the loop starts so we use the first iteration through the loop as the moment where we capture that okay so smallest is is the value and then we print it we go back up and now it runs again with 41 41 is not none none is there's only one thing that's none so it is not equal to none smallest is not equal to none or is not none so this is false so it skips over here then it asks the question is the value we're looking at 41 less than smallest well smallest is 9 in this case and this is 41 so that's false so it skips that and goes on so we see 41 we don't take it and then you can see that it this will never become true again this is pretty much false for the rest of the iterations of the loop it's false for the rest of the iterations for the loop so just is going to run down here and ask this question and at some point we see a 3 and we run this code we capture it we see 74 we don't capture it we see 15 we don't capture it so then the for loop skips out and at the end we have the smallest and actually this would be a good technique for the largest as well because it really is just a technique to put a marker in this variable so that we snag that first number or first whatever as we uh read and parse through them so the is and is not operators are very useful in python you can think of them as like the double equal sign they're asking a question and um they're asking a question and they return it true or that you know blank is blank returns a true or a false it is stronger um double equal says are these two can these are these things equal in type and value so just as an example if i were to say is 0 equal to 0.0 it would say yeah that's true but then if i says 0 is 0.0 that would be false so that's because these two are the same value y's and these two are not the same typewise so is is stronger than equals meaning that it demands equality in both the type of the variable and the value of the variable and no conversion is done and so that's just a very strong don't overuse is if you're dealing with numbers or even strings use double equals don't use is because sometimes it it gets a little confusing so use is sparingly i tend to only use is on booleans and on none types i don't use is on integers and i don't use in use is on floats and i don't use is on strings just none or true false and also is not is also an operator so you just say blah blah blah is not none or blah blah blah is not false okay so we've uh been looping around and doing loops and loops of loops we looked at the uh the the indefinite loops the while loops that kind of run for a while the definite and we looked at break and continue as a way to either escape completely from the loop or go back up and discard the current iteration of the loop we looked at none we looked at boolean variables with for loops definite loops where you've got some kind of a set or a list or some kind of sequence that you're looping through and then the concept of loop idioms where you do something at the top something to each item and then some you you sort of get a benefit at the bottom and and so that gets us through iterations [Music] hello and welcome to python for everybody uh i'm doing a worked example my name is charles severance and i'm the instructor for the class and the worked example that we're going to work on right now is in chapter five and it is exercise one we're going to repeat uh asking for a number until the word done is entered and then we're going to print the uh the total and then we're going to print the count and then we're going to print the average at the end and we're going to enter some numbers and we've got to do some error checking and we're going to keep on going so we'll ignore this we'll just say invalid input and then we're going to ignore it so i'm going to start from scratch i'm going to write a uh i'll start my terminal start some atom and so i can i've opened the py4e folder and that's kind of cool because now i can do things like say new folder and say i'd like an exercise 0501 and then go into exercise0501 and say file new file and then say file save as and put it in exercise 0501 and then name the file ex0501.py i'm going to start from scratch on this one instead of adapting another piece of code uh i'll say print five and i'm going to do this because now i need to get to the point where i'm in the same folder in this terminal window cd desktop pi free ex i can string these together and there i am and i say python3 ex and there i go okay so i'm in good shape so there's a couple things right now and we're going to do the total count and average and so this is just a basic pattern where we're going to have we're going to need a iteration variable for the count i'll call that num we start that at zero and then tote and i'll start that at 0.0 so that's the running count and the running total now we need to write a loop and i'm going to write this as an infinite loop while true with colon and then i'll indent and i'll prompt for a string and remember input gives us a string so i'm going to call this s val equals input enter a number colon space i'm going to deal with the try and accept later but you can just kind of know that the floating point value that we're going to do is sometimes this little bit of code will fail i'm just going to take the string value here s val this input returns us a string and i'm going to convert that to float i'm going to say print f val so i can print that out and then i'm going to do the num equals num plus one and tote equals tote plus f val now i do need to deal with the situation where i'm entering the word done now we wanted we want to check that before we convert it to a float because done well we can run this it's an infinite loop but it'll it'll it'll only run a little bit it won't cause us too much problem if i run python let me drag this over here and i go one two three and if i put in something bad it's running i don't have a way to get out but you can see that you know it blew up on line five it blew up right here in line five so what we want to do is we want it to say you know one two three done but we want it to detect that we've typed in done so here we'll just say if the string value that i got back from input is double equal quote done quote break so that basically will break us out and now print all done i should be using single quotes here too much java coding print all done then i'm going to say print um what do i want to print the total the num and then tote common num comma tote over num now we're we've got to be careful because we don't want to divide by zero but that'll get us sort of a ways so this is going to run it's going to read these things it's going to accumulate here this is the accumulator pattern this is a counter pattern where we're adding one to a current variable and a cumulative pattern where we're adding a value to it so now we should be able to see the done um four five and six and then done and the total of four plus five plus six is fifteen the number is three and the average is five and so that's really good the all done prints out i just did that for yucks and you can see the value that's coming out so that's in pretty good shape so i'm going to comment this out and comment that out so this is pretty good it works just the way we want it to work four five six but we if we do something other than we're done then we're going to blow up in this float and so this is where we're going to have to do a try and accept because we just know that this line line seven line seven is the danger zone okay so what we're going to do is we're going to put a try in here and then we're going to indent the part of code that seems strange and then we're going to have some accept code and the first thing we have to do in the accept code is print out the word invalid input come back print invalid input now just like in an earlier example we we have to do something here to make sure it doesn't just keep on going because f valve doesn't work we're not going to see the error message that would be the traceback here on line 9 we're going to run here but we still don't want to add because fval will be so this is where we can use the continue so in this code we're using both the break to say if i'm all done break and if i have a problem i'll print a message out and then i'll say continue so the continue basically says go back up to the top so that is how when we see enter some bad data we print invalid input and without adding anything new you don't really see it here without adding anything new you go back up to the top and enter a second thing so now if everything is right i should be able to type bad input 4 5 6 bad input bad input done and i have a total of 15 and 3 items and the average is 5.0 so there we go that's what we're going to get and that roughly achieves uh the same thing and it's a combination of a loop with a exit mechanism we have some sanity checking of our input so making sure that we have some valid input and we catch it and we use continue to loop back up to run the next iteration of the loop and we have an accumulator pattern and then we can use the accumulated data to print what we want to print so i hope that this has been useful to you uh exercise 5.1 for python for everybody hello and welcome to chapter six in this chapter we're going to talk about strings and chapter seven is the payoff chapter so we you know up to this point we're still learning sort of basic building blocks and actually we're gonna write a real program in chapter seven so just learn this and the payoffs in chapter seven so we actually been using strings from the very first lecture because if you print hello world well that's a string and so we've been doing things this this little this slide here is all review uh we use plus to concatenate strings we use print to print them out prints just a function it takes as a parameter something strings integers etc um we we can put digits in strings but we can't add to them by now you figure this out but you can use things like ins to convert the strings to integers and then print things out so you know we we've been doing this for a while we've been talking about strings all along now today what we're going to do is going to just get into strings in more detail we're reading we input data with the input function input returns us a string um and if we want to input a number we have to run some kind of conversion like we have to do on int before we uh take this data that we read from input you know and so there's there's things that we've got to do and we've been doing all these things in in programs so far but if we look a little in a little more detail inside strings we can uh index within strings each character so each character has a separate position and a separate index and they basically the the letters are have positions and the positions start at zero and the best way to i explain this to remember this is it's the elevators as we used in one of our examples long time ago elevators in europe start at zero and so strings start at zero as well turns out in the old days there's some efficiency with the notion of lists of things starting with zero these days the efficiency isn't the issue but there's a certain elegance uh starting at zero even though in intellectual you might think one would be the the first character in the string might make most sense to be sub one but it's not it's sub-zero but just remember that and so we have this operator called the index operator and it's square brackets so you know fruit is a variable that contains the string banana and then fruit sub 1 is the character that's in position 1. now that actually is the second character i'll keep reminding you until i get tired of reminding you and so that that assigns a the the letter a into um i mean a the letter a into the variable letter of course that's a badly choice it's a either a well chosen variable name or a badly chosen variable name um and the thing that goes inside this can either be a constant or it can be an expression so this is x equals 3 and then fruit sub x minus 1 well that means 2 which is position 2 which is an n and so that gives us an n so the index is an operator and you can add this bracket syntax to the end of a string variable you can't index beyond the length of the string so if i say z sub 5 well there's only three characters which means 0 1 2 but sub 5 doesn't doesn't work and of course we get a happy little trace back so you have to be careful when you're starting to pull stuff out of strings although some of the things allow it some of them don't and you'll kind of get used to that we can ask how long a string is and so we use the len function we pass the string variable and we pass it into len as parameter and limb gives us back the length of the string not the position so it's zero through len minus one so it's 0 through length minus 1. so length is just another function that we've been doing functions now for a while you pass in a parameter and then len does some work and out comes 6 and that goes back into x because the function has a residual value it just happens to be a built-in function and so you know somewhere deep inside python there is code that takes this and somebody wrote a loop or looked something up and then returned a return value and sent back a 6 to go into our x variable and so function is there like i said we've been using this for a while another thing we tend to do is to look through strings and look at strings and dig data out of strings python is excellent for doing sort of these kinds of lookups and so we can write a simple loop we can write a for loop that creates some kind of a iteration variable like index and given that we know that these positions are zero through five we can set this to be zero and then write a while loop while the iteration variable is less than the length of fruit and remember this is six so it's going to be zero through five zero through five are the the very values we want to generate and then we can look up one at a time pull out fruit sub index so fruit sub zero fruit sub 1 2 3 4 5 and then print out the position and the letter index and then add 1 to index and it runs this will run 6 times 0 through 5 and now we go to produce this output right here and so that's one way of looping through strings that is a basic indeterminate loop but we construct carefully an iteration value construct an iteration value and work our way through that loop data the other way is to use a determinant loop a for loop and generally when we're able to use a while loop or a for loop all else being equal we generally prefer a for loop and so here we have the four key word and fruit and it's an in and so for letter in fruit well that just says letter is our iteration variable and it's going to take on the successive values of each of the characters so this loop is going to run six times and letter is going to be b a n a and a banana i'm always terrified when i make these slides that i'm going to misspell banana because somehow i always think that there are two ends somewhere i don't know it's not one of my favorite words to spell i actually didn't choose banana as the constant the author who i borrowed the textbook from alan downey and jeff elkner they used banana and so i'm still using banana so some of the jokes in the book aren't my book aren't my jokes they are the jokes of jeff and alan so here are just two equivalent you know so you can have the while loop they sort of both do the same thing they both just print the letters out one one time through each of these loops runs five times but you can see how the the determinant loop the for loop is a prettier loop unless you truly somehow need to know this number as you're going through the loop but if all you're doing is going through and you want to touch in order each of the characters of the string you then simply write a for loop because it's more elegant the less code you write the less code you write the less chance there is for you to make a mistake and so the fact that these are equivalent this is three lines that well two lines of a loop and this is four lines of a loop that's twice as many places as you could make a mistake because you might you know misspell index or something i mean why even make an iteration variable if you don't need to make an iteration variable and so we can do things that uh harken back to our iterations and loops chapter where anything that you can do in those things like look for the largest letter look for the smallest letter search to see if a letter exists or say count the number of a's in the word banana and so that's what this is doing and so we um so so we have a counter so again we do something at the top of the loop we're going to do something in the middle loop and we're going to print it out at the bottom so we start our counter at zero we're going to loop through ba all the letters and then if the letter is a then count equals count plus one this is kind of a pattern in a loop where we're noticing something instead of like we did it earlier where we said found equals true well we're going to count them this time so if we have one we'll get one if we have zero we get zero and how many ever there are but there should be three because it's going to run three times and there's three a's in banana and so this is a you know a conditional within uh count we've seen counts we've seen conditionals in loop uh in prior chapters and so again i love the in keyword in python it again reminds me of set notation in algebra if you're if you're a math whiz if you're not don't worry about it or maybe you will be a math whiz and you'll say well this set notation reminds me a lot of the the in uh keyword in python so um again it's for iteration variable letter again don't get stuck with letter i just happen to be using it here in banana and that is for each character in the string banana run this loop once changing the variable letter to be the particular character that we're pointing at and so it's taking care four is taking care of a lot for us right and so this is sort of this really smart for loop the for loop is you know both deciding how many times to run the loop in this case six and it's advancing the letter so advance print and you know decide whether you're done advance print decide whether you're done advance print decide whether you're done advance print decide whether you're done advance print decide whether you're done advance print decide where you know i am now done because i you know we're done with that particular string and so you can think of the four as you know magically doing all of this for you of both deciding how long to run the loop when you're done or not and moving down through all the successive letters in the loop so next we'll do uh talk a little bit about additional things that we can do with strings [Music] so now we're going to dig into strings a bit and we've already looked at how you can pull out a single character in a string and now we're going to look at what we call slicing and that is pulling chunks of a string out and again we're going to use the square bracket operator and uh and so so s and the way i say it is sub s sub zero through four that's how i read this s sub zero through four so the i look at the colon as through i look at the brackets as sub and so uh s sub zero through four says start at position zero and then go up through but not including four right so we don't include four so that's probably the hardest part of this up two but not including up two but not including um this seems counterintuitive kind of like starting at zero seems counterintuitive but after a while you'll kind of get used to it and there'll be situations where you're writing code like oh that's why that works better but just for now remember it up to but not including it's just kind of a little thing um we'll we'll come back to when that is uh useful for us um six through seven well that ends up being starting at six up two but not including seven so that's why we only get the p out um now one thing that python is pretty nice about is it's not going to give you a trace back we might expect that 6 through 20 well there's no 20 characters but it's like ah that's okay we'll just let you stop at the end and we'll start at 6 and go all the way to the end oh no trace back it's almost disappointing sometimes when python uh doesn't trace back when you think ah you know if you're so obsessed about everything i would have traced back in that situation but hey it's i guess if you're you're allowed you're allowed and so there we go now you can eliminate or omit the first or last if you eliminate the first it assumes the beginning of string if you estimate this eliminate the second it assumes the end of the string and why you would do this i don't know but that's from beginning to end so it's the whole string so whole string eight through the end is thon and up two but not including two is mo all right so so you get that so just it's that's pretty simple once you've got the rest of slicing and the rest of string indexing the notion of eliminating the first or the last of the colon expression the set first or second of the colon expression i think is actually pretty intuitive pretty nice we've already been concatenating strings together we overload the plus operator and there is no space added remember when you're doing print x comma y this comma does turn into a space but that's not what's happening here there is no automatic space being added and so we see hello and there and it just is hello there with no space and so if we want we just have to concatenate the space explicitly if we want to put spaces into strings the problem is if you if this you might think it's more convenient to add a space with concatenation but then you have to think well what about if i want to concatenate things and not put the space in then i'd need a different operator so that's kind of why it works that way we can use in differently as a logical operator so we're using it as an iteration structure in for loops but we can also use it as an uh logical operator in if statement so it's kind of like the you know double equals or not equals or less than or equals or something like that it's it's like those guys and um and so and it returns a true or a false is n in fruit so that's a question and the answer is true is m in fruit no that's the answer to a question is nan in fruit doesn't have to be single character can be more than one character and the answer is true and then you say something like you know if a in fruit and so this is the logical value that returns a true or a false and yes we found it so that becomes true in this particular case so it runs the little indented bit so in is an operator in this particular situation in a for loop in means something different and we'll use in for other things as operators as logical operators uh coming up in a bit you can compare strings and this has to do with the character set of your computer the character set that python is but in general um you know it is lexographically less than and lexographically greater than uppercase and lowercase are a little weird i think when we use the max function earlier the way my computer was set up uppercase was less than lowercase but in general uppercase is less than lowercase but in general it's it's bad to assume case um but there is a deterministic way to sort strings you can you know have something equal to or less than or greater than and all those operations work uh naturally the less than greater than you have to kind of be aware of uppercase lowercase things like where um you know punctuation sorts less than less than or greater than letters it's that's kind of unpredictable and depends on the character set of your computer and something you just play with and figure out if you're doing sorting stuff by first name and last name as long as the case is kind of the same um you know if um if you were sorting chuck with an uppercase and glenn the fact that these upper cases they'd sort right and these lower cases would sort right but if you were to subdue instead lowercase chuck and uppercase glenn then that would sort weird as a matter of fact this the g would come before that and so case can mess this up but in general other than case and special characters and other things it technically works it's just hard to kind of predict it a lot of what we do is use the string library and so the strings are objects and we'll talk later about what that really means and objects have these things we call methods so a string object has some built-in capabilities and one of the built-in capabilities that the string object has is here is a string object and because greet is a string object if we said type we'd see that it was an str dot lower says hey dear string make a lowercase version of yourself it's like calling this function lower and passing greet into it and then give it that back to me now it doesn't actually change greet it gives me a lowercase copy so here i have hello bob with an h and a b uppercase and what i get back in zap is hello bob all lowercase and note that greet is unchanged so hello bob is still there and you can even call these methods on constant so this is a string object quote hi there quote dot lower that says call lower on this bit of string and give me back a lowercase version of it and so it prints out as the residual return value this is like a function call a method call is a kind of special form of a function call it's a function call where you say the thing dot the function name rather than function name pressed in as a parameter like len for example is non-object-oriented you know len of x that's not object-oriented object running it would be x dot something parenthesis but you so constants are objects as well and taking the lower gives us back lowercase high there and so that's just one of the things that you can do in the string library these are built into string variables and constants they're just always there as soon as you make a string they're part of it and when you do type and it says it's class str we'll get to object oriented don't worry we'll get to object oriented and so you can do things like use the type um if you're just look this used to say um typester but it's cluster kind of this is more of an oh the word class is an object or any concept but it is a string and you can use the dir and of course there's extra stuff up here and this is showing all the different methods or capabilities things we can do to strings so you know x dot something parenthesis well what can we do there this is all of those things that we can do to x's that are that are built in and come with x's i mean come with strings uh when we build them and python of course has great documentation online for all of these spring methods and what they do and how they work and why they work the way they do and so here's some of that python documentation we'll look at a few of these but you know don't hesitate to say python string uppercase and then we're like oh yeah yeah that is upper right and so here's a few things that we can do and use some of the ones i use a lot and we'll look at each one of these things um so the find operation says find me a substring within a string right find me a substring within a string so find me the first n a and give me back the position so that gives me back 2. and then i can say go find a z in there well there's no z and so it returns me negative 1. so that's what the fine does so we're doing a lot we're going to use this kind of stuff a lot and we do a lot of looking in strings converting things to upper or lower case there is an upper method and a lower method so greet greet.upper and that means the uppercase nnn is hello bob greek.lower that means that dub dub dub is the lowercase hello world and greet is unchanged greed is still hello bob with upper and lower because each of these methods basically say i'm going to give you back a uppercase copy or a lowercase copy of the original thing without changing the original thing search and replace is super useful super duper useful and it's pretty clean here we have a string and we use the replace method in this case we're passing in the old and the new bob replace all bobs with jane's and so that takes this hello bob and turns it to hello jane again greed is unchanged greed is unchanged and it does more than one thing so this says go find well let's clear that this says go find all the o's and replace all the o's with x's and so it goes and finds two of them and then out come two x's and so that really is a replace is not just replace the first one but replace all of them white space as we'll see is a big deal and white space is not just blanks although the most common thing but it's also sort of non-printing characters like tabs and new lines and other kinds of things and so we have a number of different ways to strip white space so here we've got some spaces at the beginning and spaces at the end and we print out we do an l strip and that throws away the spaces at the beginning that's the left so that's the left strip it all takes any if there's nothing there it doesn't harm it our strip means throw away all the blanks on the far end and then strip says go take take both sides both sides for strip and so that pulls out all the spaces on both sides this will be useful because sometimes when you're tearing stuff apart you'll find yourself getting extra spaces sometimes at the beginning sometimes at the end and it can be tab or new line it's it's sort of white space space that is kind of not visible clear that's what white space is it's like if you were on a piece of paper it's the it's the white space it's like x well that's not white space but right here oh that's white space it's any character that doesn't cause printing to happen if that makes any sense it's any character where nothing would be printed and there are characters like that there's like even bell characters but we don't use them very much we can ask very conveniently we can say hey does this line start with a particular string and so you know line does this that's this is a question going to return a true or false does this line start with please and the answer is true it does start with please does this line start with a lowercase p no it does not and so again you use this in the context of if something colon some block of text it's a block of code so we can combine these things to tear stuff out and so let's assume that what we want to do in this case is we want to take a from line this is from an email form email format from a mailbox and this has got the from with a space and the person's email and then at sign in the school they're from and a space and then the rest of the stuff like when this mail was sent and this is a real mail message from this guy stephen from the university of cape town in south africa it's really steven and this really is the first line of a file that you'll get to know pretty well by the rest of this course hi steven you we like you you are the example in my class and have been for a long time people actually who know steven have taken this class and they're like stephen i saw your picture in the class so if you're ever in cape town at the university of cape town say hi to stephen and tell him that you saw him in the class but okay that's neither here nor there what i really want to do is i want to extract his school from this email line okay so now eventually we will do things like you look the data will come from files but this is still chapter six so this is the data we're going to search through and so we can say hey let's go find the at sign search up to this position and find the at sign so data dot find at sign and give me back where that's at that's in position 21 it's position zero then what we're going to do is we're going to look for the next space after the at sign so we're going to start at the at sign and tell find to start here and look forward until it finds a space so data.fine look for a space starting at the position of the at sign and then that'll be in position 31 so 31 is what we get in the space position so now what we have is we have in two variables we have position the position of the at sign and the position of the space after the outside now what we really want is this bit right here so we have to go one beyond the at sign and we don't want the space so we say we're going to use slicing here data sub at position plus one up to but not including the space oh smiley face because we didn't have to say space minus one because that is up to but not including and so we get that little bit right there so we don't this we don't have to say -1 there because this is not actually included the thing that's at the position the space is not included so that's already a little benefit for the up to but not including and so when we print this variable out host we get exactly just the school that stephen uh works at and probably went to as a matter of fact i don't know if he went there or not so this is uh just kind of a note for non-latin character sets you know all programming languages from the 60s on tended to work in what we call the latin character set which is united states and england and europe and lots of places use this abc character set and the special characters but it's really common to want to um use different characters and so if you're going from python 2 to python 3 and we'll talk about this a little later when it matters more luckily we're in python 3 and so python is one of the big things about python 3 is that all the internal strings are unicode in the in python 2 there was sort of some confusion as you went between strings and this is just a little bit of code and so i'm putting a in here you know some asian characters this is korean actually asian characters into x and i say what kind of a thing this is and that is a string and then there's this unicode and this comes from python 2. if it's a unicode operation it's still a string whereas in python 2 if you put a international characters into x then it was a string and then there was a separate kind of a constant called a unicode constant and it was a different type and there was ways that you had to mess with these unicode variables as you did things like read them from files and put them back into files and did other things so it was much more difficult in python 2 but we're doing in python 3 and in python 3 it natively understands uh non-latin character sets international asian character sets you know spanish french character sets and so this is a good thing for python 3 and this is one of the real benefits of using python 3 and as we start doing stuff where we're exchanging data with the outside world this will come into play and i'll have to show you how to use it there was weird things that you had to do it just makes a lot more sense in python 3. okay so we've talked about strings we learned about the string we're converting it we've done a whole bunch of stuff and this is again you know we're not we're not yet doing anything super useful we're learning sort of how to like slice and dice even though we're sort of not making the meal yet up next we're going to talk about files we're going to read some data and we're going to slice and dice and use all the things in the next chapter that we've learned up to this point so see you see in a bit [Music] hello and welcome to python for everybody my name is charles severance and i'm the author of the book and the teacher of this class uh in this particular session we are going to do um exercise 6.5 from the textbook it's a it's an exercise in parsing text strings and so the basic idea is is we're going to see strings of various kinds in various lengths and we're going to want to extract pieces of them okay and so the idea is to somehow get this part out and then convert it to a floating point number this is a proxy for later things where we're actually reading files or reading stuff off the internet but parsing strings is an important thing for us to do okay and and so let's take a look at a couple of different ways to do this so let's go ahead and get started let's go bring up our atom and i've got it open nicely to uh the right spot here and i'm going to make a new folder hopefully by now you're finding atom ex0605 adam or whatever your programmer editor is a sort of a powerful um a powerful tool i'll close this one file new file a powerful tool that lets you sort of save a lot of keystrokes etc etc print uh exercise 6.5 just for yocks x r size and then file save as and again until i save it it's not going to have the pretty colors i'm going to save it in 605 the x06 dot py and now it has the pretty colors and here i am now i've been doing these and so now i'm actually already in a directory so let me show you how to do relative directory so so i'm in this path right here and i can use both in windows and in mac and in linux i can use the command cd dot dot that sort of thinks of the one before the one that came before and so now i'm up one directory and if i do an ls i will see that this new ex0605 that i just created in this directory from atom is already there cd ex 0605 in the next chapter we'll be talking about files and this is where you really need to know this concept of folders and files so ls and i'm going to run python3 ex0605 dot py there we go exercise 6.5 so we're sort of in the right spot we've got this going and we've got this going pretty soon we'll be putting stuff in these directories that need to be there uh and you'll see how all that'll work in a second well in in in the next chapter where we've got to know all this stuff okay so we'll just grab this first line here and paste that in print stir so let's run it and there we go actually there's supposed to be a space right there so i don't know why this space didn't get copied and pasted from my copy and paste so i'm going to put that space in there's supposed to be a space right there i think but we'll see so the key thing is if you look at the lectures from this section you can like look for things and you look for a pattern and so what i'm going to do is i am going to look for a pattern that says find me a colon okay and i'm going to say where is there a colon equals stir dot find and then print out ipos so i'm going to say where in this string is there a colon that's going to give me the position and offset of that so that says that the colon is in position 18. now it's not always going to be 18. sometimes these strings will be a little bit different okay so the next thing i can do is i can say uh a small piece of this string is you is do stir and then um starting from that position i pose through the end of the string and then we're going to print that out print out the piece and when i'm doing uh string parsing tearing strings apart i i tend to have a lot of situations where i print over and over and over again so now let's see if that piece is the right piece and the answer is it doesn't quite look right because see i've got that colon there and that's because the the posit it says start at 18 position 18 where that is and then keep on going and so i need to do i pose plus one so let's see i'll just sort of advance past this little colon character and get into that space okay so let's run it so now i've got space zero eight four seven five and now i i can just see it value equals float of piece because piece is a string it's it's a string and then i'm gonna say print value to see if i got the value right and and let's remember that there's a space here this might mess up float i don't think it's going to mess up float because floats trying to find a floating point number and it kind of but let's just see if it works let's just see if it works okay so that the key is there's it's in position the colon is in position 18 the string we pulled out is blank zero eight four seven five and the floating point number is 0.8475 so we've sort of solved this now i can clean this up a little bit by making that plus two so let's just i'll just change that to plus two and you'll see how that changes what i'm doing and so now this this here is the string that one there is the string this is the actual floating point number they're the same thing other than the fact that it's a floating point number and you can add something to it so i could do something like print you know value plus 42.0 and that would that would actually work right so 42 point and if i did print piece plus 42.0 that will blow up right because peace is a string and 42 is a float and says cant can float can't convert float object to string implicitly okay and so other than sort of taking out this extra stuff i'm just commenting on a whole bunch of stuff here oops so i take out all those print statements these five lines are the lines to do this particular assignment where we are tearing apart a string and in the future the source this is just so that we can play with strings but later we'll be taking this data from all over the place finally we're going to start opening some files and then later in the course we're going to be doing opening data from databases we'll be opening data from the internet and so all there's all kinds of sources of data where we get these strings but for now we're in chapter six and we're only focused on strings so i hope you found this useful and coming up soon we'll be opening files hello and welcome to chapter seven this is the chapter where it all really starts to pay off we have been learning bits and pieces and doing little two lines three lines four lines of code to learn the basic building blocks of python and learn some of the syntax and find lots of terms but now we're actually going to start doing something so if you look at what we've been doing so far you know we have been we're inside this little computer and you type up you you know the python says what next and you give its command and it does something and you do something else and does something and you do this three or four times unless you write a loop and then it goes like you know 10 20 times and that's it and then maybe we write a thing that reads something from our keyboard gives us something back and then we write something that prints something out print a few foot things out and so we've been pretty much using the keyboard the screen the cpu and the memory that's kind of where we've been living and well it's important to talk to the keyboard in the screen the the real world is things like databases that live out here uh files live on our systems and you know connecting to the network and reading reading data from the network and so that's what we're starting to do right now is we're starting to be able to work outside kind of our code and create things that are permanent um and so we're going to be talking initially we're going to work on files we'll later talk to databases in the network and other stuff but for now we are talking about files and so really kind of we're stepping out a little bit and creating reading things that are permanent and creating things that are permanent the kinds of files that we're going to talk about mostly are text files and you can think of these as a sequence of lines in a file that are easily read by python you you've been making text files all along your you know hello.pi that file is a text file too you're using a text editor to create that file you put your python commands in a file you run those files and that's what it is and so a file can be thought of as a bunch of lines you know one two three four five six seven a blank line here that's possible and um but the but the reality is is that these are actually just lines and we have a special character called the new line that we'll talk about in a second so to read a file uh you have to call the open function and open returns what we call a file handle open doesn't actually read the file open makes it prop possible so that you can read the file so the the parameters to open are it takes one parameter it's required which is the name of the file another parameter it's optional whether or not to read it or write it if we're reading the file it doesn't harm it you can read it over and over if you write it it actually if there's already data in that file it truncates it and writes something we're not going to really write files we're mostly going to read them and so open sort of you pass it in a file it gives you back this file handle and then you have a variable in which you store it i often call it f hand to be mnemonic just you'll see my code i use f hand all the time to indicate that that is a file handle and so if we were to run this in uh in interactive mode we'll open mbox.txt and that is a function built into python and then it gives us back a handle does not give the data and kind of see this when we print out the file handle using the print statement it doesn't print the lines that are in the file the lines that are in the file are sort of out there there could be like you know 10 million lines for all we know lines in the file the handle is like a little opening outside of your program and you can talk to the file by opening it then you can read stuff you could if you're writing the file you can write stuff and then you close the file to shut the handle down but handle is a thing that allows you to get to the file it is not the file itself and it's not the data in the file it's just a wrapper that kind of allows you so this if you print it out it's like that's the file we opened we're reading it and encoding has to do with the different kinds of character sets which we talked about at the end of last lecture that unicode character set etc utf-8 is a a great character set it's it's it's probably the most typical character set that you will run into it although you can have different character sets of files but most of them are utf-8 so of course this is python if you make a mistake and there's a file that doesn't exist we get a trace back and it blows up um we'll show you how in a second how to deal with that now the new line character is an important part of file reading and in files and strings we can put the new line character in by this backslash n character and the backslash n is the character that indicates that we're supposed to go to another line go to a new line go to a new line and so we have what is this well that's a backslash n that's a backslash n and so if we print it out we print it this way we see that the backslash n is in there this is how we type it we actually type backslash n to python to indicate that we're supposed to put that there but if we do a print statement it actually interprets the backslash end so the backslash n causes kind of this movement to the beginning now the print actually at the end of this adds another backslash in so so the backslash n that we put in by putting it into this string is that one and then print always puts a backslash n at the end there's actually a way to override that backslash n behavior by putting something on the print statement which we'll talk about later now it's important to note that the backslash n is one character right and so even though this x backslash ny prints this and then print adds another new line to go down to here if you ask how many characters the what is the length of this well it's only three that's because that's a character the backslash n is a character and the y is a character so it's a three character string so the backslash n is a character like all the rest of the characters but it's only um we we encode it by typing backslash n it's called an escape where the backslash is the escape backslash n is a way to say new line because we can't see it it's a way for us to encode in a string this non-printable character this invisible character the white space it's part of white space so as we're reading through the file we can think of it as a sequence of lines and we can read these a line at a time we can also read them a character at a time if we want and so but it's more common to say read this line read the next line read the line after that etcetera etcetera etcetera but the way to best think about this it it it doesn't really matter you can think about it as lines and we will in most the programs that we write but realize that the way when we see this we see it like this it comes back to the beginning it comes back to being there's a character in the file at each of these points to say go back to the beginning it's like hitting the enter key on your computer and that is a new line so you have to think that in the file in order for your text editor and python and everybody to know where the lines end you put new lines in the file and that's another character so you know this looks like an empty line this line here looks like an empty line but really it has a single character and the character is a new line and it turns out that in a bit we're going to need to keep track of the fact that every line is ended by a new line so up next i'm going to talk a little bit about how to read files in python [Music] so we're going to find that there's a number of different ways that we can read through the file but the most common way that we're going to read through the file is to treat it as a sequence of lines and we're going to use the determinate loop the for loop to do this and so what happens here is we get back this handle that opens the file and gives us back the handle that handle x file is the variable i named call i just named it x-file that's not the data but it is a sequence it is that file handle represents to python a sequence that we can potentially walk through and then get all the lines and it's the simplest most beautiful elegant way to read all the lines in a file we use the for loop and we have an iteration variable this is going to take when we talk about the file cheese is going to be the first line then the second line and third line then the fourth line so it's like going through a string but you're going through a file now and you're getting it line by line so that's each line i just picked a variable named cheese so you didn't get confused later i'll call this line but it but python doesn't know anything special by naming that variable line okay and so this is it's the 4 and the end and so so for i read this as for each line in the file x the file handle x file so run this loop one time for every line and then print it out so it's actually really quite simple okay um other languages like c or c plus plus or other languages they have to write while loops with endophile conditions and all kinds of things that make this very difficult but this is one of the prettiest things that python has it's a it's a very very pretty thing okay so let's talk about what we might do and we're going kind of back to iterations now what if we wanted to count the number of lines in a file well this is a basic loop counting pattern so we open the file and then like in all these loops we do something to sort of prime the loop to get it started set a variable count to zero and i'm going to use the variable line that's going to go through each of the lines in the file for line and f hand down the file and it's going to run this loop once for each line in the file and the variable line is going to change but all i'm going to do is add count equals count plus one and so that's just like from counters that's just how you detect so every time we see a line we're just going to add one to the counter we're not printing the line we're not looking at its data at this point and then when the line is done however many times it has to go out it comes and we print out line count equals count and so if we open inbox.txt this is going to do all this work and then print this line out and say line count is 132 045 so this is a little five line program that shows you how to count the lines in a text file using python again simple and elegant and not too much syntax for you to have to learn now it's also possible to read the file as a series of characters all in one go read the whole file in now you've got to be careful depending on the size of the file this is going to lead to a string variable with a lot of data in it now if it's you know 100 000 characters that's actually kind of a small thing but if it was uh you know 10 million lines that would probably not be good you'd want to read it one line at a time and process each line and then do something but mboxshort.txt is a small little file so we open it and we get back a file object filehandle object and we call the read method and that says look go through and read all the text and give it back in one big blob one big string and i'll put it in imp and so that's where you have a line a new line a line a new line a line a new line it's not really lines it's just a sequence of characters with new lines in there to punctuate them and now you can split that later we'll see how to split that into separate lines if you want now i picked a file that was short and so this imp variable now has a string in it and i can use the lend function pass a string into the length function says oh 94 626 characters that's kind of a small um a small little file and perfectly okay to read it all in one go and so now i say just print the first 20 characters that's you know beginning to up to but not including 20 and so it shows that the first 20 characters of that little file is a from line because this is a mailbox file now let's say we're going to do a searching and we did this loop where you're looking for something and so we're going to search for lines that have a prefix of from okay that's what we're going to do and we're gonna print those lines out so there's lots of lines in this file you know line line line line from line line line line from right on and on and on and we don't we only show these lines the ones that match right that's what we want to do and so we are going to write an open statement and then we're going to loop through and we're going to ask the question if the line starts with from print it so sometimes it's going to skip skip skip skip and that's going to run it then skip skip skip skip skip it's going to run it skip skip skip and then it's going to run it okay so that's the basic idea and then then i'll finish when it's all said and done and so this is like an a criteria this is like a search we're looking for lines that match the string that have their string from as their prefix now when we look at the output of this it's kind of weird we see kind of these little blank lines that show up blank blank blank blank blank blank blank what's going on here what's going on so let's take a quick look the problem is is new lines well i mentioned that the file has new lines in them and so when you do the for loop it doesn't throw the new lines away as you might expect it would be kind of nice if it did but it doesn't it actually shows you when it when you read it reads that first line up to and including the new line and gives you that back as the variable so that is the first new line so that means it's going to go down and then the print statement actually adds another new line so that's the the second line of the file has a new line at the end of it and the print statement adds another new line so if we take a look at the code there is a new line oops come back if we take a look at the code this variable line has a new line in it oops where am i at i'm in the wrong slide there we go yeah this is what i want to do if we look at the code there's a new line in here and then the print adds another new line so the print adds a separate new line and that's how we get two new lines the print statement's new line and the new line from the file here's how we fix it and you're going to write this code a lot because when you're reading text files you end up with a new line and often you don't want the new line but thankfully as we saw in the previous chapter there is a nice little function in python for strings called strip that allows you to throw away white space and to review remember white space is anything that doesn't print and this new line is not a non-printing character so our strip gets rid of it so it's a way to get rid of white space and our strip does it from the right end so it's the right end of the of the of the string and so if we just are going to loop through all the lines in the file we say line equals line r strip and then this variable no longer has the new line at the end of it we have our little if statement and if we print it then this line the data has no thing and then the print the data has a no new line in it so the print only goes down one and so now we have single spaced output and so you're going to be doing that a lot it's really common to read through a file and then just strip the new line or any trailing space off the end of that now there's a couple of ways to do a loop like this and let's let's just think of this as we're looking for a line a file with lots of different lines in it and we want to ignore all the lines except some say good lines and we want to do something with those good lines or the lines we're looking for needle in a haystack this is like searching for a needle in the haystack so if you look at this code at high level we're going to loop through everything and then we're sort of picking which lines are and these are the good lines down here now often we have a bunch more code that we want to do and we're not just printing them but we're going to do a lot of code so sometimes you actually structure the loop a little bit differently and so the way to do it and this is going to do the exact same thing it's just a little different way of thinking about this loop so the top part is the same we're stripping it and what we're doing here is it everything's the same here except we add this knot if the line does not start with from that's the translation of that if the line does not start with from continue so basically we have a skipping pattern so the lines we're not interested in we skip so we come down we you know skip a lot of lines and then we find a line that's good and then we fall through so this is the good code and then we have all the other good code that we want to do to that line we have that showing up down here and so there's just two patterns that two ways to do the exact same thing so another way to select the lines that we're interested in is to use the in operator so we talked before about the in operator and how that works so we're basically and i'm going to use the continue skipping method so we're going to read all the lines these first few lines if the uct.ac.za is not in the line skip it and so this is going to print out all the lines that have the string uct acza in it in them and so you see this is the output of the program sometimes you'll have programs that want to read different files often i give assignments where i say show me how this program runs on the short file and then show me again how it runs on the long file just like this and so the way we do that to input the file name instead of making the file name be a constant to the open call we make the file name be a input so we just run an input statement which gives us a prompt and then we type mbox.txt and then that shows up in this variable f name it's a course of string all the time and we pass that into open and then we open it and then we do you know the count operation so if we enter inbox.txt it counts 179 1797 subject lines in mbox and if we give it inbox short it says there are 27 subject lines in mbox and again this is another one of those ifs and it's just counting but only counting lines that match a particular a particular pattern okay so now the user can also type bad file names and we need to be able to deal with that as well and so we we're taking a small small change to the code the danger the dangerous code is this line right here this line right here is going to trace back if that file doesn't exist so what do we do well we're going to just expand that the rest of this program is exactly the same the only thing's different is we we've got this line we've took out insurance on it and we know that it might blow up and so we we have it in a try and accept block so here's how the code runs so you know the input runs we type in a good file name it comes in here this works and so it skips the acceptance so it runs the code and prints out the count so that's the good pattern the bad pattern is here we type in a bad file name it comes in the try except this file name is nanabubu and it's going to blow up so this line blows up so it jumps down into the accept code prints out file cannot be opened so prints this out now this quit is really important because if we don't put this quit in here it's going to continue down here and that's going to blow up here because file handle is not defined properly at this point and so what we have is we have this quit quit is a special function where it comes in and never returns so this is a way to terminate the entire python program silently with no back right so we put in our own error message so we look like we're professionals say could not open this file and then we stop if you don't it's going to come down here and it's going to trace back trace back right there it's going to blow up so the quit is useful when you want to stop executing because you've detected some kind of an error so that's a quick zoom through opening and reading through files and doing some patterns um most of the rest of the programs in this course are going to say open for our strip do look for and then do something interesting that's going to be our loop that we're going to do over and over and over again and now we see how this looping and if and iteration and variables are are starting to come together and you can actually sort of do a program that does something useful but before we get to too many more programs we've got to switch a little bit switch gears and talk up next about data structures and that is the shape of data and how we can use more intricate and complex variables to help solve our problems [Music] hello and welcome to python for everybody my name is charles severance i'm your instructor and in this particular video we are going to do exercise 7.1 and this honestly even though it's a really simple exercise is one of my favorite exercises in the book because in chapters one through six we've been just kind of learning the basic mechanics it's like we just um you know it says bonjour and we say bonjour and it's we're learning but it's not very fun because we're not really solving it so this is our first program that's going to read a file and do something with it and the only thing we're going to do is we're going to convert it to uppercase so it's just a really simple file okay so so here we go but the nice thing so this is important though i keep it simple because you got to figure out how to manage files and that's the payoff for everything that we've done up until now so let's close that one and say file new folder ex0701 welcome to files here we are in files file new file and i'm going to say file save as and put it in exo701 and i'm going to call it i could probably come up with a better naming convention than these things but it works for me and i'm a programmer so here we have this file um let's start up a shell a terminal program command line in windows shell and unix cd desktop cd python for everybody folder in the desktop pi 4e cdex 07 tab to get me to that i'm in this working directory and i have these files in there and so i say python3 python 3 ex 07 i saved this so it's an empty file python is perfectly happy running absolutely nothing because i guess you've made no mistakes okay now here's the key thing we have in this situation we've got to have this file and this this uh version of the website has pythonlearn.com this could also be py4e.com but what i'm going to do is i'm going to control click on this control click open link in new tab okay so now i have this new tab and this is the file okay and this is the file that we're going to read we're going to read this file okay lots of stuff in this file you're going to get to know this file really well the key is is we've got to put this file in this directory because this program is running in this directory and this program needs to open the file and this is where if you got some magic little button and you know whatever click run python it doesn't know what directory it's running in and that's so we've got to get this file in directory so the program runs in the same directory so it can open the file we're looking at right so here's the directory we want to go to and i'm going to say so i've got this file sitting here and this works for text files once they're sort of viewable in the browser i'm going to say file save page as and i'm going to go to my desktop make this a little bigger python for everybody ex0701 and you'll see this is sitting here and i'm just going to say save so that's now been saved as if i downloaded the file so let's go into the terminal and do an ls and so you see that by my action here in the browser i have saved a file into the exact same folder that i've got the code for 7-1 so now i can open this file and i can make sense of this file okay so i'm going to add them and you see adam even sees the file so now i can even open the file and add them and here's the file in adam adam knows how to read these text files and so away we go so now ex07.py this next few lines of code you're going to get to know pretty well and so i'm gonna create a variable called the um come back i was in the wrong place i'm gonna open the file with the open command fh equals open inbox short dot txt now remember that open does not actually read the file it kind of gives us this little portal where we can take a look at the file and so if i print fh you might expect that will contain all this data but it doesn't it just is a file handle so let me run that python 3 x07.py okay so it's know some information this we'll learn later what objects are this is a python object that has some information in it but the information in this object is not actually the file data to read the file we're going to write a while loop i mean a for loop sorry not a while loop so we're going to say 4 l x again that line is a good name here but i want to use a non-mnemonic variable for line in fh and then i'm just going to say print lx okay so this is going to loop through every line in the file and print it out i'm gonna have to make this a little bigger now because it's gonna be very chatty so that is just a loop to read through maybe make this a little smaller there we go oh boy oh boy oh boy missing parentheses in call to print what am i doing wrong here well the problem is is i've been using python 2 for so so long that when i'm not drinking enough coffee then i'm going to talk python too so the right way to do this so let me just run this in pipe uh nevermind i won't run into python 2 i'll just fix it because printsco in parentheses in python 3. so i save it and now let's run it again okay so there you go now as we scroll through this you see right away the problem that i'll get rid of that print statement right there let's get rid of that guy just delete them we know how that line works but you see this extra space remember because in the file there is a new line so if you go over here we make this white as well there is a non-printing character at the end of every line called the new line which is the way in files we store the fact that it goes back to the beginning so it's like character character character character character go back to the beginning that was a new line you know character character character character character character character the next character is a three the next character is a new line and so it goes to the beginning of the next line so that's the new line and print the print statement automatically adds a new line and so in this case we have one new line from the file and then we have another new line that prints adding okay and but that's okay we can say l y equals l x dot strip actually r strip to strip the characters from the right hand side the non-printing characters from the right-hand side and then we'll print out l-y so i'll clear my screen and run that again so it does exactly the same thing but you don't see all these blank lines and so this so ly is a different variable i i made a strip of it and then i did printed l y instead of lx if i had printed lx i'd get these extra blank lines and so this is a very common thing you open a file you loop through it through all the lines in the file and then you throw away the non-printing characters at the end of the line and then you do something with it so right now we're just printing it but what we're supposed to in our assignment is make it uppercase and so let's just call the upper method okay make them all uppercase see if that works and there you go and so now the line has been shouted they're only one line and everything is fine right so it's been shouted um all uppercased this little syntax here is what's called a method this is a string variable an upper is a method within the string variable that returns us an uppercase version of that and again that's object-oriented terminology and we will learn about that in an upcoming chapter but for now we just sort of type it and understand it and later we'll understand better that there's a whole series of things you can do after this dot uh for for a variable in uh in python and so there you have uh the exercise 7.1 um for the book python for everybody i hope you found this useful see on the net hello and welcome to chapter eight we're going to talk about lists in this chapter up to now we've been talking about algorithms algorithms are the concept in computer science of using the programming language to express the steps that you want the computer to go through to solve the problem read some data convert it to a floating point number check to see if it's greater than 40 do one thing if it's greater than 40 do another thing if it's not then print out the result or uh open a file read everything if the first line starts with something do something if not skip it and then add all the things up those are steps those are a series of steps and hopefully by now you're getting to the point where you have a good understanding of steps but there's a whole other side of computer programming and we call it data structures and data structures is not the steps but instead clever ways that you lay out the data and clever ways that you make sure that the data does what you want it to do and so that's we're going to start talking about now lists are the first and most the simplest data structure strings are kind of like data structures but lists are probably our first real data structure that we're going to think about and design and make use of effectively but before we talk about what is a collection we should talk about what is not a collection so we're familiar with what a variable is we know that a variable is a little piece of memory that's got a label on it and then an assignment statement you know sticks a 2 into x and then x is and then 2 is in this little cupboard and then it goes to the next line and then 4 goes into x and so the 2 goes away and the 4 is there a key thing is you can't have more than one variable at any given moment right and more than one value in a variable so when we move to collections collections are more like suitcases we can put lots of things in them we have ways of organizing them and as we go through lists and dictionaries and tuples we'll see how there are different ways to organize them and as a matter of fact we've been talking about lists for a while every time we use one of these square bracket syntaxes in earlier programs we've been working with lists and so this is technically a three item list with three strings got commas here joseph has one string glenn and sally or another string and here's another one that is another thing and the list is basically it's a list constant and it's being assigned into a variable so this friends variable has three things in it so that's different than what we've been talking about before so these brackets and bracket structures with square brackets are those lists and so the print is just a print with parentheses to get the print to work but 1 124 76 is a three item integer list red yellow and blue is a three item string list but it doesn't all have to be integers or strings python can handle different things in different kinds of data in different positions in the list so red 24 98.6 a three item list with a string an integer and a floating point number and while we're not going to use this too much for now this outer list is a three item list and the second item is another list so this is kind of alluding toward what we'll do when we start talking about data structures and that is we have a structure and then we have another structure inside of it and sometimes this can get quite complex and we're doing this for a reason this here has no reason just to show you that it's possible that that lists can be made up of lots of things including other lists and of course there is also the notion of the empty list and like i said i have had to be able to tell you about lists all along we use them in for loops we can put lots of things here we can put file handle here we can go through the file we can put a string there we can go through the characters in the string and in the list and the iteration variable then goes through the successive elements of the list and that's why this prints off y4321 and then the loop is done and it prints out blastoff so we've been using them and we've been actually iterating through lists with four statements all along so the for each i mean the for statement um you know has has been something we use with lists and every when you just need to go iterate through the list and go through every item in order uh the four the four is a great way to do that so friend is our iteration variable friends is our list variable and so that says friend is going to successfully take on the value joseph glenn and sally and print out you know happy new year joseph glenn and sally runs three times once for each of the values and the iteration variable advances now i do want to make it really clear that the choice of friends uh and friend uh singular and plural is arbitrary and capricious it happens to be convenient and intuitive that the iteration variable is one and the list variable is more than one but python has no idea about singular and plural as a matter of fact python would care it would be totally equivalent for python to do the same thing to have the list variable be z and the iteration variable bx x will take on the successive values of these three things now am i being nice to you by calling this list friends and this iteration variable friend i am but i also don't want it to confuse you if you're just a beginning developer so just like strings we can sort of look within lists part of the thing is when you put more than one thing in a data structure you need to get them out and so lists have positions they maintain order and so the first thing in the list is the sub zero position sub one sub two just like strings there's zero based just like european elevators there's zero base so if we take a look and we say oh friend sub one that's how i read that the little square brackets when you take a variable here and you say friend sub 1. remember singular and pearl don't matter print sub 1 means glenn because this is the zero and that's the one and then sally's the sub 2. and so that's what prints glenn out in this particular thing now lists are mutable mutable is another word for changeable that can be changed meaning that a list has three things you can change the thing right in the middle if you want to take a look at what's not mutable strings are not mutable so if i take a look at assigning banana into fruit well fruit sub zero is a capital letter b could we imagine for the moment that we could change fruit sub zero zero to lowercase b well this syntax would be how you would do it if you could do it but it turns out that strings are not mutable meaning they're not changeable once you create them and that's why when we do things like lowercase or uppercase we take a look at the fruit and we say give me a lowercase copy of that and then we take the return value from this and we store that in x and that's how x becomes a lowercase banana but fruit is still the original one so fruit has not changed compare and contrast that with a list though here we have a 5 item list 2 14 26 41 and we're going to do the sub 2 position and the sub 2 is 0 1 2 so that's that one right there and we're going to assign a 28 into it so that 28 is going in here gonna wipe that out and put 28 in so we can do item assignment in lists by putting a bracket syntax on the left hand side to say don't just put it in a variable put it in this position within the variable so that's what that's doing and when you print that out to 28 everything else is unchanged i mean the whole list is there there could be a thousand items in the list and then you're changing the second one we have a function called len we've been using this len function all along to take a look at how long strings are it counts the number of characters in the string so that's a nine character string if we have items in a list len tells us how many items there are it's not like how many characters there are it's the number of things and each thing doesn't have to be a number it could be a number a string or even another list and len is the way to say hey how many things are in there there's a function that returns lit a list of numbers and we use it as we'll see in a second to construct specialized loops to go through okay so we've taken a look at loops and now we're going to just take a little a bit of a look at some of the operations that you can do with loops um python has this as we'll soon learn object-oriented approach to its operators and the plus can add strings and it can add numbers floating point numbers integer numbers strings etc and so the plus similarly works this way with lists the plus looks to its left and looks to its right and says what am i adding and in the case that i'm adding the list one two three and the list four five six it concatenates them together in this way it sort of functions like a string and so we get one two three four five six it's just concatenate list this list to another list and it doesn't change a or b just like in any kind of assignment statement calculations on the right side don't change the variables and then produce a new variable and then assign that into c you can also also use list slicing and it's it it's easy to remember if you remember how strings work list works exactly the same way so it's you know of course it's a little tricky the first number is the starting position they start at zero so one is right there so it's the zero position one position start at one right but go up two but not including three there's two one two three so this goes up 2 but not including 3 and that's why we get 412 out of that so up 2 but not including i'll just say that over and over and over again if we do you can leave the first part out you can leave the first part out here and you can say oh up two but not including four so that starts at the beginning goes up two but not including four and so that's how we get that piece right there we can say um start at the position three zero one two three start at position three and go to the end now the fact that the number three is in here is sort of irrelevant three to the end is those three numbers and then you can do the whole list with slicing as well again these pretty much are the exact same examples i used when i was doing strings they're pretty much the same there's a number of different methods and you can look up all the documentation in list i often just use the dur command to remind myself of them append we'll look at count looks for certain values in the list extend adds things to the end of the list index looks things up in the list insert allows them the list to sort of be expanded in the middle pop pulls things off the top remove removes a an item in the middle reverse flips the order of them and sort puts them sorted order based on based on the values so let's look at a couple of these so if we build a list from scratch we have a way to ask for an empty list a couple different ways to ask for an empty list we could use just two square brackets next to each other but this is a form we call the constructor form where we say hey python make a list in this case the word list is like a reserved word to python it's really a reserved class but say list parentheses says make me an empty list and then assign that list into stuff so stuff is now it's a list object it's a type list but it has nothing in it and then we can call the append method stuff.append and stick book in and then we say oh let and that knows how long and the stuff knows how long it is where the end is and how to add something to it and then add a 99 to it and we print it out we got book in 99 reminding ourselves that lists while they're often the same types of variables same types of values in the various positions in the list it doesn't always have to be that way then we say oh we'll stuff that a pen cookie you can keep on going and then we end up with three things and the cookie we have an in operator it works pretty much like the in operator in a string is 9 in my list and that's pretty simple and the answer of course is yes 9 is in my list is 15 in my list looking through no it's not 15 is not in my list is and then there's the not in operator think of that as kind of like one operator is 20 not in the list and the answer since it's not there is true and so that's a way to just you know it's kind of like starts with or in for strings same kind of stuff lists are in order and they're sortable and so this is something that we take good advantage of uh a lot of what computers want to do is sort stuff you know look all these things up append them and then get them sorted and so there is this method inside of print of inside of list that's just the sort method so here we you know put three values in zero one two position zero one and two joseph glenn and sally and then we tell the list to sort itself and then we print it out now this is actually sort of the list in place which is different like than upper and lower because if you remember strings are not mutable but lists are mutable and so you say hey just sort yourself okay and so just sort yourself and then it sorts it and then it's in alphabetical order glenn joseph and sally i happen to be clever i only put strings in there and i put my uppercase and lowercase in a very consistent pattern but the list has changed and if i look at list sub 1 that is the second item which is joseph that prints out right down there there's a whole bunch of built-in functions to help manipulate list the other things i was showing was a method sort is a method that's part of list but there are other functions that take lists as their arguments we already talked about the lend function tells you how many items there are there is pretty obvious max it says go through and find the largest min go through and find the smallest sum goes through adds them all up and we can say let's do average by taking the sum of all of them and dividing it by the length and you might think to yourself oh wow i wish we had known this a few chapters back when we were having to write all those loops to do max min sum largest smallest etc you can kind of think in your mind that inside each one of these functions is a loop that does pretty much what you did in those chapters and part of the reason we did that back then even though these things were here was they're kind of easy loops to understand um and so uh those are there and and basically there allows two different ways of building loops to do the maximum minimum now it's not necessarily all that much easier to to do something using these because you either can do them the old way or you can do make a list and then use these functions so let's take a look and i'll just say that these two bits of code are doing the exact same thing and what they are is they're implementing a program that's going to repeatedly ask for numbers until we type the word done and then it's going to compute the average and tell us what they are and so using sort of the stuff from uh the loop chapter we start with a total variable an account variable set them to zero and then we read a number we check for done we to break out but then we convert it to a floating point value and then we say total equals total plus value and count equals count plus one and so this is going to run over and over and over again however many times we're going to do this and then it's going to pop out and when it's done it's going to have this value of total the running total will become the overall total divided by count and it'll print the average out okay and so that that's kind of how we would have done this before we knew how to do this with lists now let's take a look at the other one and the other one we say let's make an empty list remember this is that constructor syntax that says to python make me an empty list and assign the empty list it has nothing in it right but it is a list has nothing in it into the variable num list now we're going to write another loop we're going to this part here is the same these three lines read the number if it's done quit and convert to value but instead of doing the actual calculation right now what we're going to do is just append it to the list so the list will start out empty then the three will be in the list then the nine will be in the list then the five will be in the list so we're appending each time through the loop we're appending into the list so we're just growing the list every time we read a value instead of actually computing something with the value that we've got so either in either case we get value and in one case we append it to the list and then finally it finishes the break happens and then we just say oh hey python sum up everything in the list add these three numbers together and then take the divide it by the length of all those things and you'll have the average and so these two things give us exactly the same output now there is one difference if there was like one million or one billion numbers they actually have to all be stored in the memory simultaneously whereas here it's actually doing the calculation uh of the billion numbers and not using up so much memory for most of the things that you're going to be doing the difference in memory there is a difference in memory this uses this one here uses more memory but i can't draw very well more memory um it uses more memory but it doesn't really matter by the time it's all said and done and so for you this the the difference between these things is not all that significant but it's important to understand that they're just two techniques to accomplish the same thing with lists [Music] so now we're going to wrap up and talk a little bit about how strings and lists are related they're sort of related in that they both have zero base things and we use the square bracket operator to do various things but there's a lot of situations where we're looking at our data and we're combining the use of lists and strings so let me show you the first thing probably the coolest thing we're going to use it a lot the rest of the class and that is the split function so let's take a string we've got abc here it's with three words what we're interested in the fact is that there's spaces in this word and what split does is says you know i'm going to look through this thing i'm going to find this and i'm going to break this into pieces and i'm going to return you a list of the separate individual pieces so read look for blanks and break it in pieces and give me back the pieces so i'll print these out and now you see that it's a list with three items with three words the spaces are gone but it's given it to us so it's like split this into words please and give me the individual words and give me a list of individual words rather than a big long string with spaces in the middle of it and that is a quick way to go from a line and and it's really common a lot of things were going like go get the second thing or the third thing or whatever so the split's really nice because then you can just grab stuff and so you say oh how many things did i get well i got three the lend function tells us that and i can print the first word i got which is and with the sub zero and that'll be like with will be the first word because that's the sub zero position so i read something i split it i can say there's three things and i can look at stuff the first word basically without really knowing much now if you remember earlier and we'll see this we used find and slicing to do a similar kind of thing but people tend to prefer uh the split and you can you can you know oops go back you can also then um loop through them so you can split these things into stuff as a word and then go through the with w and then it's gonna and it's gonna go through uh w is gonna take the successive with three words and so you can make a loop by reading some data splitting it and writing a for loop and then it's effectively going through the words in that line of data and so that's a really powerful concept that we'll use in a lot of the programs that we're going to write just a couple of bits about this and how it works um split with no parameters here it looks for spaces but it also treats a bunch of spaces as a single space and so it's pretty smart about that and so even though this has a lot of spaces between lot and of you only see lot of all the spaces are gone it does something special about spaces it's really white space so tabs or new lines or other characters would also qualify in split basically now you don't always have to split based on spaces and a lot of data that you're going to run into you're going to want to split on something else and so here's some data that looks like we're using colons to separate the first second and third piece now if you just call split split's looking for spaces and so split gives you back a list of the things broken apart with spaces but there's not a single space in that line and so we get one a list see it's a list but there's only one item and the semicolons are sitting there split doesn't go like oh this looks like it should be semicolons you know split's job is to use spaces and split the string based on spaces okay but given that this is something we like to do you can tell split what character you'd actually like to split on now it's not quite as clever when splitting on something other than spaces it doesn't understand that you know if there's a bunch of semicolons in a row it still thinks of those as splitting points to split but in this particular case when there's no spaces then you know it's going to split that so it says split this based on the semicolon based instead of being based on the um the space and so if that you take a look at what comes out of this we split on semicolon now we have a three item list and we get first second and third and a lot of your data comes out of some logging system or some routers status updates who knows what you're looking at but the delimiter is often something other than space and you can do that with split so this is a useful thing when parsing things like our email address right we wanted to get things like the email address this second piece off of the line um and so we can use split to take advantage of this and so here's a little loop that's just going to print out not the email addresses but instead the day of the week we're going to print the day of the week out for all these things how do we do that well we can observe really quickly that if we split based on spaces we it's the 0 1 2 it's the two positions so we can quickly write a bit of code that you know opens the file then loops through the lines we do this all the time now a strip takes off the end of the new lines we can check to see if it starts with from space right from space is our key so we're ignoring we're ignoring all of the lines that don't start with from sprays but then we find the line that starts with from space and we split it and then we just print out the second word and so we get the second word of the lines that start with from and that's so how this thing works now sometimes we want to dig into deeper and we will take something split it and then split another piece of it again with a different delimiter so let's just say that the thing that we want to achieve is getting the part after the at sign for email addresses and we did this with again find and pose and stuff like that but you can use split to do this as well so the first thing we're going to do is we're going to take this line we're going to split it based on spaces right chop chop chop chop chop and the fact that there's an extra space there doesn't matter split happily just like zooms through that and then words sub 1 0 1 2 word sub 1 is this email address so we'll put that in a variable called email and so email will be a string that's just this so in two lines we've pulled out the second address into a variable then what we're going to do is we're going to re-split that we're going to take this string we've got and split it based on that sign because we know it's an email address so we get a new set of pieces the first part is the person's name and the second part is the host name that their email is hosted on and then what we can do then is we just happen to know that we just happen to know that this this is the zero item and this is the one item so we can get at that so the interesting thing of going here if you think back to how we did this before with find and pose and all that stuff it's really a lot cleaner and we don't for me i can i can look at this after you understand it and it's easy for me to understand that it's correct whereas that pose stuff you got to add one and start the second find after just just remember that and this is a lot cleaner way and this is a more typical way of pulling this kind of information out of a line so in this chapter we've talked about lists we've talked about the concept of collections that's our first data structure we're not just doing algorithms we kind of know algorithms now but now we're going to do data structures and the next this chapter and the next two chapters are our foundational data structures and then we'll like everything we'll make more complex data structures by composing those data structures together we've looked at how strings and lists connect together and how split works and these are all really powerful tools that we're going to use going forward [Music] hello and welcome to chapter nine now we're going to talk about python dictionaries python dictionaries are probably the thing that most programmers love the most about python because they're very powerful they're like a little in-memory database it's the second of our kinds of collections and probably the best collection to review what a collection is it is a situation where we are going to have a variable like a list or a dictionary that we can put multiple pieces of information in rather than a single piece of information and of course prior to collections we would put something into x and then we put something else into x and it would be overwritten and uh now with lists we can append things on to the end and so if we compare lists and dictionaries the list is sort of the organized version of the collections it everything stays in order you add something it always adds to the end you take something it sort of compacts itself it's zero through the n minus one where n is the number of items and so it's very organized kind of like a pringles where the potato chips are nicely stacked um dictionaries are messier you can put things into dictionaries there's no real sense of order in dictionaries everything has a key so you sort of throw things in and they kind of mix around in there somehow and you pull things out based on the key it's like you you sort of stick a label on it you know where you say okay i'm going to take this thing and i'm going to put chuck on it and i'm going to take uh these sunglasses with the chuck label and i'm going to throw it into the dictionary and i'm like hey give me back chuck and like oh here's your sunglasses because you mark everything this is like the key this is the value i took a pair of sunglasses and i threw it in so it's kind of like a purse or it's a sort of like a mess and so the idea is is you have these labels that you put on everything that you're going to throw in like i'm going to put i hope it won't stick to my keys i know what else do i got here i'm going to stick a label on my pen a chuck label and i'm going to store a pen in my dictionary with the chuck label and so it's like having a a purse or a bag or a backpack where you have things labeled and you can you can throw things in and label them and you can shout into your bag and say give me the calculator or give me the candy or whatever that is that you have labeled them you have to come up with the labels and then you can use the labels to get things back out and like i said they're probably the most powerful thing and and they're basically this concept that's generally referred to as associative arrays which means they're like lists but they have these keys and so the associative means the association between a key and a value whereas in a list there's a position in a value the position is less powerful and less flexible most modern programming languages have this notion of associative arrays if they don't they're sort of unpopular because uh once you get using them they're like whoa they're so powerful if you ever find yourself in a language that doesn't have them you'll you'll freak out they get have different names like property maps or hash maps or property bags depending on the language you're using but they all are the same thing they're key value pairs so the idea of a dictionary is that or the idea of any collection is putting more than one thing in and then the difference is is that you have ways of of indexing it so this basically line says let's make ourselves a dictionary just like we constructed an empty list and i want to store 12 into this dictionary and i want to label it money and so on the left hand side when we use this money that's the label that we're going to give it and so 12 is being placed in the dictionary that's like taking the 12 throwing it in the dictionary with a label of money i can't yeah three is going in the label of candy and 75 is going in with tissues we say what's in there and there's no order to it and sometimes the order can even change inside of a dictionary although there are more advanced versions of dictionaries that maintain some kind of order but for now let's just not worry about the ordering of them if we say what's in there you say oh there's three things in there there is 12 75 and 3 and stored under the keys money tissues and candy respectively we can ask using the index operator what is purse of candy that's like saying hey give me back candy and out comes the number three which is that we can update stuff so we can say go grab the candy version add two to it make five and then store that back into candy and so now we see that candy has been up to set up to be five and um and so if you look at the difference between lists and dictionaries they both can have new items added to them um we haven't talked a lot about deleting but items can be deleted from them uh the difference is is the indexing mechanism how we look things up how we store things and how we look things up so we make an empty list we make an empty dictionary we add 21 to the end and we add 183 to the end and we ask it and says oh position 0 is 21 in position 1 is 1823 do we don't see the positions when we print it out because it's sort of implicit here we're going in and mark 21 with age and stick it in and mark 182 with course and stick it in and then we're going to print it out and there we go course and age mapped and we can add a 23 and stick it back in age and that overwrites so that 21 becomes the 23 we can do the same thing in the list except we say lists of zero because in list the indexing is positioned and so this 21 becomes 23. and again you just look at them and you can think of each of these as pretty much doing roughly the same thing except the indexing mechanism the values are the same but the keys of this are are different so in list the keys are always the position and you don't get to assign those other than the fact that the order in which you put them in implicitly assigns a position and in dictionaries the the key is a string you can actually use other things i use strings a lot in this lecture but that just kind of keeps things simple until you get good at it you can actually use numbers as the dictionary index the dictionary keys if you want but the values are things you put in and manage in those dictionaries so we can just like lists we have dictionary literals and what's nice about dictionary literals is that they use the exact same syntax as the printout and so it starts with the curly brace ends of the curly brace and then has a series of key colon value key colon value key colon value and this is sort of the associative array bit we are associating one with the key chuck we are associating 42 with key fred we're associating jan and a hundred then we printed out it kind of looks exactly the same and so the print statements in python are are nice in that you ask what's in a thing you show the stuff and it shows you in the syntax that if you type that into python that would be how you do a um a constant and if you just say uh empty array you've seen you see me also do dict this is constructor where you say make a new empty dictionary this is an empty dictionary constant these two things are pretty much the exact same thing this is a shortcut to doing this the the empty curly braces is a shortcut to do the um construction so up next we're going to talk about sort of one of the really common applications of dictionaries and that is counting [Music] so now we're going to talk to you about one of the common applications of dictionaries and that is making histograms it's counting the frequency of things and so if you think of a histogram as you know it's a little graph and there is um you know a how many a's how many b's and how many c's and there's a histogram says oh there's this many of that and this many of that and these are like buckets these are frequencies and this is how many times it happens so a histogram but we're going to do this thing where we're going to take count people's names and we're going to kind of count how many that we see but the interesting thing that we're going to solve just like many of the things in the computer is we can't just sort of look at the data we got to look at the data iteratively one piece of data at a time so i'm going to give you a little problem okay i'm going to show you a series of names one at a time and i want you to count for each name make a little bucket and then keep counting how many things for each of the different names okay you'll notice that you have to start with one and then you move across so just watch this and tell me how many how many what's the most common name of the set of names i'm about to show you and how many do we see [Music] so [Music] so how many what was the most common name and how many times did you see it that's the question now here comes the reveal so for humans it's so much easier for you to just look at this and you think how did my brain look at that and you're like okay what is pretty common oh maybe maybe chen is coming oh no maybe jen is common one two three four yeah that anybody else mark watts got three c7 and so you'll notice how our minds as come without computers we just sort of like bounce branch and bound we have hypotheses and then we decide yeah it's zen that's it and there's four of them now how did your brain think about this as we were going through them one at a time well my guess is you if you really had to do this a lot you would make a little picture like this and then what you would do is if you saw a new name you know x y z you'd add it to the list and give it a tick mark of one and then if you saw like c seven again you give that a tick mark and if you saw x y z again you'd make a tick mark and then you'd make you'd keep adding to these tick marks right and that's how you would do it and you wouldn't like many of the things we do in a loop you wouldn't really know what the most common was one until the end and then you'd sort of take a look at these numbers and you say okay that's the most most common number and then you'd you'd be done but you have to watch them one at a time you can't just bounce around and so that's how we're going to use dictionaries to achieve that again instinctively as humans we just look at the stuff but if you add a million things you probably want to write a python program and use dictionaries and so this is the idea and there's two basic things that happen one is the first time you see a name you say is this name there already if it's there already you really just want to add one to it right that's the adding of a tick and or you want to see for the first time you know blah blah blah blah and give it a one and so you can use the name as the key and then one is the value and then first time you see chen you stick one in there and so at this point inside the dictionary sort of dynamically adding as soon as it sees a new name it adds another slot in here but then if you see the same name again like chan again then you end up with a one add one to it and so it's two and so at that point chen is two and so you can see how you can both extend the dictionary by encountering a new name or adding when you see a name that you've already seen before the problem with dictionaries is like everything in python there are rules about what you can and can't do and one of the i think kind of frustrating things about dictionaries is that you can't just look for a key that doesn't exist so this is a fresh brand new dictionary we do a constructor there and we print out sub csev and boom it blows up and that's bad but we can solve this by the in operator the inoperator we've used in the for loop so we use it in lists we use it in strings so that is a question it's a saying is csev in ccc well this is this empty one and so it is no it is not c7 is not in ccc and so no using this in operator we can avoid the traceback we can say if it's not there put it in if it is there add one to it and that leads us to this bit of code okay and that is the kind of code that we're going to build a history this is going to histogram code okay and so this is going to have name as our iterator names sorry i made them singular and plural that's that's nice but so name is going to be csev chen c7 now normally we'll be reading this from a file but for now we'll keep it on keep it easy we're going to go through this and we're going to have counts as our dictionary so that starts out empty and we're going to do a simple if then else every time through the loop if the name we're looking at is not in the dictionary already is the key then set it to b1 if it's not go get the old value count sub name and then add one to it and stick it back in so this is this line right here is new adding a new thing and this line right here is adding some things to existing things and you do this long enough you start with an empty one and you do this long enough at the very end it will print out the histogram that you're looking for the histogram you're looking for and so you say oh we've seen csev twice jen once and chen twice and so that's the idea and so this can run a million times if you want now this notion of checking to see if a key exists and doing one thing if it doesn't exist and doing another thing if it does exist is such a common practice that the dictionary object has this method called get that'll that come collapses these four lines into one line and so the idea is you're going to do one thing if it's in there and you're going to retrieve the current thing otherwise you're going to pick a default value in this case we'll pick one i mean pick zero this is like the default right meaning what is not there and if you say counts now counts as a dictionary dot get that's like string.upper that's a method you give it a key and then a default and if the key exists you get back what's in the key if the key doesn't exist you get the default okay so and with no trace back this works so so the best way to think about this is those four lines are equal to that one line because x is either going to be whatever was in there before if it exists or it's going to be 0. now the nice thing about 0 is the next thing we'll do is we're going to add 1 to it so that that's going to get us to 1. so collapsing that loop that we saw before collapsing that loop we can we can make it just a one-line loop and this will become an idiom this will become something that you will get used to and you will use over and over and over again and after a while right now you're looking at it boy boy that's a lot of syntax and semicolons and whatever after a while you just type this and not even think about it it's an idiom it's basically included in this idiom is how to both create new entries and dictionaries and update existing entries by adding them adding one to them so everything else in this is the same name is going to go through these five values we're going to say count sub name equals counts dot get name comma zero plus one and so if for example this already has a one in it then this is going to be 1 plus 1 becomes 2. if it's not it's going to be 0 plus 1 equals 2. and so this is the idea of if new set to 1 not 0 set it to 1 because the first time you see something the count should be one not zero so that's why we make this default now the get can be used for anything it just so happens that zero is a common default because it's really common that we're using this to basically make a histogram right a little histogram of a b c right and so we need to make a d and then but then the histogram has to start at one so that's basically the simplified counting with get and you know there's a lot of things that we're going to do inside of python that do have to do with frequencies and how many times certain things happened and this pattern is a really good pattern to absolutely know [Music] so now what we're going to do is we're going to switch from just looping through strings instead loop through files and we're going to it's going to take a little bit of work because we have to open the file and we'll bring a lot of things together at this point so here would be another task and that is here's a bunch of text from the book and uh you can just split this into words and count and find out what the most common common word is and how many times it uh how many times it occurs so go ahead and try to do this for a second feel free to pause actually don't bother pausing this is too hard we should write a program for this it's not it's not easy humans don't like this it makes you concentrate and so here is a counting pattern where we're going to take a line and then later we'll read this in a file and so we're this is just an adaptation improvement of the previous thing so we're going to start with an empty dictionary we're going to ask for a line of text and read it in and then we're going to use split so remember the list of words well what we're going to get here is a list of words we'll print it out and we'll run this counting this is the this little loop for every word in whatever this was we're going to do this idiom of at either adding a new entry or adding one to an existing entry and then printing that out so let's take a look at what we get there so if we run this we can give it some text and i've got this this would be all one line and then it splits it into words and you see that these words here are split split split split i mean that's strings and splits remember strings and lists and split and so now the counting is going to go through this list the clown ran after the and it's going to build a histogram the clown you know one clown the up up up of these things are going to go up right that's this histogram and then when it's all said and done we end up with the histogram and so counts is the dictionary that ends up with a histogram and we can so by inspection see oh the is the most common word and there are seven of those right so if we sort of take a look at this we start out we make a dictionary we read in a line of text the text goes in we and then we split that and we print the words out so these are the words right then we have a for loop that's going to loop through all those things and then produce a dictionary and when we print the dictionary out that's what we're going to get and the seven okay so that's one line of text that's how you walk across the words in a line of text after you've split the line into separate words so now we're going to look at ways that you can loop through dictionaries we just produced a loop that can build a dictionary but now we're going to going to look at a dictionary and so we'll start with a very very simple example and then we'll work to a slightly more complex example so here's a dictionary just the constant chuck is one fred's 42 and jan's 100. and so we're going to use a definite loop with the four four key and counts now it doesn't have to be main key but key is the is a good name because these are these are keys and values k v k v keys and values i just mentally think of this as keys and values and keys and values so this iteration variable is going to work walk the keys it's not going to walk the values it's going to walk the keys chuck fred gent not necessarily in that particular order as you see it goes jan chuck fred because just because i typed it in this order it's not like a list it doesn't stay in that order it might move around a little bit as we add data to it or as we set the data up and so you can in the loop you can get the key and so that's what prints out the chuck jam chuck fred but then you can also get the corresponding count for each one of these by just pulling it out of the uh pulling it out of the array i mean pulling it out of the dictionary right and so we can pull out the corresponding value and so we print out jan 100 chuck one thread two and that runs this loop three times so if you just use the in and you give a dictionary here remember all the different things we've been able to put there on the end of a for loop and dictionary is another thing we can put on and we get a list of keys now there's a couple of methods that allow us to get the keys and so we have you know we can say turn this into a list and we get a list of the keys so this is a dictionary the same dictionary we get a list of the keys you can also get a list of the keys by using the keys method so that's take this dictionary jjj and give me all the keys which gives me a list which is kind of the same thing and then we can ask for the values and they give me just then the values extracted out of this dictionary so that's nice now the one thing is is that while i said you can't predict the order if if in two statements you ask for the keys and then the values they at least come out in the same order even though you can't necessarily predict the order that they come out they come out in the same order and then there is a third thing that we can do and that is list ask for the items we can say give me the items and that gives us a list this is our first really kind of composite combined data structure where it is a list a three item list zero one two and inside that there is what are called two tuples jan maps to 100 chuck maps to one fred maps to 42. coming up next we're gonna have a whole chapter on that and so just take a look at that for the moment and we will come back to that in some detail later this whole items idea that gives us back a list of key value pairs because it's not just a list of keys or a list of values it's actually a list of key value pairs allows us to write in python a very clever and elegant loop what we can do is actually this items gives us back each item in the list has a key and a value and we can actually take two iteration variables for a comma bbb this is two iteration variables and if you're coming from another programming language this is super cool and it's a python only feature i never have seen another language that's capable of doing something this simple and that elegantly so what this basically does it says we're going to simultaneously advance these two iteration variables so this is going to be the key and the value the k and the v key in the value is going to be chuck one then then they're both going to advance fred 42 jan 100 and so that means in this simple loop if we just print them out we're going to get the key value pairs of course in the order and so it's sort of aaa and bbb simultaneously walk down these key value pairs and so that's really pretty and it makes for a very succinct loop it's the syntax is a little sort of disquieting when you first see it but it's a super elegant thing and you just have to say items if you if you don't say items you just get the keys if you say items you get the key value pairs and you have to have two iteration variables if you don't have two iteration variables and use items that will complain and say what are you doing i'm giving you two things and you don't have two variables to receive them so two iteration variables and items are basically related now we're going to take a look and this is code that i showed you perhaps many weeks ago about i said this is a little story about how to read a file and count all the words in the file and now we're back to it and at this point you should understand every single character of this program every single concept of the program you should literally stare at this and look at it code it play with it until you absolutely understand it so let's take a look again i showed you this weeks ago so we're going to ask for a file name then we're going to open the file name then we're going to make an empty dictionary again this is all stuff you've done before and then we're going to have an iteration variable that's going to go through the lines in the file right so line is going to go line line line then we are going to split that line each line into words chop chop chop chop so that's words is the list of the words in one line we're inside of a loop that's going to go through all the lines and then what we're going to do is we're going to write the have the word iteration iterate through each word in the line and then we're going to do is take each word in the line and we're going to do this histogram right so we're going to this this is going to run not only just for every line but for every word in every line so we have a nested loop for every line then we split it and then we go across the line so it's almost like a typewriter where we go so it's like the outer loop is going down down down the lines and the inner loop is going across to cross across the words and eventually we are going to see in this middle in this last line every single word in the file and we're going to do the counts get word plus one which is our magic histogram making line that if you don't remember what that is go back a couple of slides i just talked about it at this point in the code and it's important to be able to draw these lines at this point in the code you have the histogram and it's in the variable counts now we want to find the largest one now we have written list we have written loops that can find the largest in a list but now we want to find the largest value in the key value pairs of a dictionary so we're going to we're going to start with the we're going to know what the largest count is and the largest word of the has that count we're going to set them both to none because we're going to prime our loop we have to prime our loop and we're going to say to none and so then we're going to write one of these cool things that says for word come account so word and count are going to go through the key value pairs because we've got items here so it's going to go through the key value pairs loop through each keyboard whatever it was there could be a million words in here we're going to go through every one and what we're going to do is we're going to make sure that key big count is the current largest count we've seen so far and if it's none well then we haven't seen anything or the current the count we just read is greater than the big count so far we're going to jump in and this is sort of like oh this is a new new personal best count for this particular data set and so we're going to remember the word in big word and we're going to remember the count in big count so this is just a max loop it's a maximum loop with the extra thing that we're recording in addition to what count is the largest what the word that was associated with that count recording it so again this is a starting part of the loop we're going to do some work and then when we exit the bottom of this big word is going to be the word that is the most common and big count is the number of times and so if we run a file we say oh in that file 2 is the most common word and it's 16 times if we run the clown file well thus the most common word in seven and so this now is can and this could have a very large file and give you the most common word and so that is sort of a really good application of dictionaries so dictionaries are the most powerful well of the they're the the most powerful collection we've seen so far um it is good to see both lists and dictionaries to understand what quest uh diction collections are they are things inside of python that can handle more than one item inside of it and we'll learn about another collection about tuples in a second just understand the get method because that leads to very compact code understanding their various ways to iterate through dictionaries and so we've learned a lot but in the next section we will learn even more and put these together and do some sorting and do some other stuff and really start to see the real power of dictionaries [Music] hello and welcome to python for everyone uh this is i'm going to do some coding it's related to the uh dictionaries chapter chapter nine and we're going to do some word counting that's uh it's basically right out of the slides for uh but i'm gonna just write the code in front of you rather than uh have you look at it in the book so what we're gonna do is i've got my text editor up here and uh let me start by making a new folder new folder for my chapter 9 exercise and then i'm going to go and make an untitled file that was from the previous one and i'll do what i always do print hello and save it and save it here into exercise 09 and ex09.py so now i have a folder that's in my py4e folder uh and that happens to be in my desktop py4e is my folder on my desktop and now i have all of these subfolders c c d x ex 08 ls is dur on windows ls oops i've got to go up one ex09 ls so i've got that file right there now i'm going to want to read some files and so i'm going to bring some files down a couple of files um python for everybody code 3 intro.txt so i've got this url and i'm going to save it save page as and it's really important that i save it in the same folder as i'm going to write my code just so that when i open this file it knows where it's at so i've saved that one and i'm going to also take this clown text i'll use this to make my life simple so i have a real short thing that i can show you how it works and so now if i go back to my terminal i see i've got exercise 09 python intro.txt and clown.txt okay so let's go back to my text editor and get started uh i will prompt for the file name input enter file colon space now i'm going to do something if the length of the f name that i just read is less than one i'm going to say fname equals clown txt i do this so that i can just hit enter and it defaults to cloud.txt if i want to give it a different name i can so this if i just hit enter at this prompt then this will give me a string that's zero length so if it's less than one i'll just assume that so let me open that handle equals open [Music] fname and let's read through it for line in handle we'll strip it line equals line dot r strip to take the white space off the right hand side and then we're going to say print line again i i'm not just doing this i really when i write code i just saved it when i write code i do these kind of stuff all the time just for my own sanity checking and so now i'm going to run python3 ex09.py just to test that i'm going to hit enter now and it's going to assume hopefully clown.txt if it all goes well and yep it read it read one line okay so that part's working i'll just leave that print statement in the next thing i want to do is kind of a classic thing where we're going to go read a bunch of lines and then go horizontally across those lines in words so i'm going to split that wds equals line dot split and print wds so i'll print that and i'm going to save it and test it i i really love test things over and over there's the actual line this file clown dot txt only has one line and it breaks it into words and so i have those words let's just run it again with intro dot txt so this will have a lot of lines line line line line line lots of lines every line has a prints out the line and then prints out the words that we split it into okay so now i kind of one of the things that i do here is i want to believe now i sort of can believe everything from here up like oh it's going to open the file it's going to read through the lines i'm going to split them into words and so then i'll just kind of behind it i'll just say okay i'll just i'll just comment that out now i needed another for loop for w in wds now words is a python list and has some number of words in it 0 or 12 or whatever was on the line and now i'm going to print out the word okay and so now we'll go through that horizontally i'll just do clown.txt so that you see i i'm not printing the line out that's the words that have been parsed from the split from the line and now we got this loop now one other thing that's interesting is just to make sure that you're you're going through all the words and i i like a print statement here to know that w is going to successfully take on literally all the words of this file so if i comment this print statement out and i run it again clown.txt that for loop starting from here is every word in that file which happens to only be one line but now if i do the same thing for intro.txt it's just going to go through the words and in a sense by nesting these two loops we're going to hit all the lines and that's a lot of stuff but it hit all of the lines all the words and away we go okay so here's where dictionary comes in i'm going to make a a variable called di for dictionary and i'm going to say give me a dictionary now d-i-c-t is not something you can choose that's saying make that's that's defining the type of dictionary d i is a variable that i chose okay so the key thing to this dictionary is we're going to make a counter and we're going to use w the word absorb elegant whatever and we're going to use that as the index so the simple thing to do is to say if w is in um di then we can say ws i mean the dictionary sub the word which is our key and the key value store of the inde of the dictionary is equal to the value that we had before in that area d sub w plus 1. and if it's not in there else d i sub w equals one and i'm gonna and on a i'm gonna print print new so every time we see a new word it's going to say new and i'm going to also then print w and the current value of the counter for w as it's going through now notice how far in i'm indented this is all part of this inner loop so this is the loop that's going to run every single word okay and i'm going to run this first with clown so it runs slowly okay so it we saw the was new it's and the count is one clown is new count is one ran is new the count is one after is new the count is one now we saw the again but now we made the count be two let's print here i'll say existing so you can kind of see it now in the print i'm printing this let's make it even a little more verbose print w and then i will make it so it prints the it prints the word before and the count after and then whether it's existing or new so we'll put a lot of print statements in print statements are cheap okay so now we see the word the it's the first time we see it and we set it to one we see the clown the first time we see it we set it to one we see rand new one later down later on we see the it's already in so existing means it was already in the dictionary w as a key was already in the dictionary okay and so that's why we added one to it so the old value was one and then we added d i sub the equals d i sub d i sub the equals d i sub the plus one w is the string the t h e that's what that what that string is okay and so so we've made it all the way through and you see the in this one line occurred ultimately seven times so now i want to print out the contents of this dictionary at the very end of both loops so i got it de-indent twice and so that will give us the counts and so this is what we get when it's all said and done you know the happened seven times but it just worked through its way through okay so you got that now this is a pretty verbose way of doing this but i did it sort of the slow way to show that there are two situations if it's already there you increment it and if it's not there you set it to one effectively inserting it right so you insert it and set it to one with this d i sub the equals one okay but let's get a little less verbose here get rid of some of these print statements because we kind of covered all that um get rid of this line and go back to printing w and diw at the end we'll leave that one in so what i want to do is i want to look at this bit of code right here this if wndi else the we do this so much with dictionaries that there is an easy mechanism to do this that combines these four lines into a single kind of contraction and so i'm going to do this i'm going to print let's put two stars out then the word and i dot get of the word comma uh negative 99 okay and so this will this this d i dot get of the word is the important part the way it is is this is a dictionary dot get says in its first parameters the key to look up which is word like the or fell or clown or whatever and 99 is the default value that we get if the key doesn't exist so this is an effect an if then else right this little di dot get w99 negative 99 is you know if it's in there do one thing if it's not in there do something else okay so so let me show you how this works and you'll see that the 99 will happen when okay so the first time we see the get returns 99. all right so let's move it over here the first time we see the the is not in the dictionary so this di.get of the word the in the dictionary gives us back the negative 99 okay and this still is working and so the is one clown is whatever but away we go okay let's do it this way let me comment this out let me comment this one out and run it again so it's a little clearer what's going on okay so the first time we see the the is not in the dictionary the first time we see clown and we know it's negative 99 negative 99 but here we asked for it and the is one because we've seen it before and so that's just this get mechanism allows us to get the new value or get a value out if the key exists and specify a default if it's not there so i'm going to go old count equals di dot get w comma 90 comma 0. so instead of using 99 here i'm going to just get rid of all this is what i'm saying is look up in this dictionary get is a function that's part of all dictionaries look up using the key w which is the and if i don't get it give me back zero and so i'm gonna say print word comma old comma bolt count and now what i can say whatever the old count is it's either the value that was in there or zero and now i can say new count equals old count and now see new count and i can say dictionary sub word is equal to new count so instead i'm going to get rid of this if then else then this is basically saying look up the old count that we have if you don't find one use a zero we'll print that out and then i'm going to say afterwards i'll print the new count now and so so we'll print the old count here are some of these blanks print the old count and you can see the old count with the because that doesn't exist was zero the new one's one clown's old is zero new is one clown's old ran old zero but now we get to the its old count was one and now its new count is two okay so by using this get and saying if we don't find it we'll assume the count is zero that makes a lot of sense right you know if not there the count is zero if the key is not there the count is zero okay so that's what this line does if get the get the value from under the key be associated with the key or give me zero back and then i can take that old number and just add one to it and then stick it back in now this is ultimately not how we tend to do it okay we tend to blend this all into one big long statement d i sub w equals this part plus one okay so that says get the old value from this key or zero and that will add one to it because that really combines all of these lines into a single line okay so i'm going to delete them now and now we've combined this all into one what effectively is an idiom retrieve create update counter all in one line i'll still i'll still print out in this case i'll just say d i sub w and then we'll see the counter okay so now i'll run this we don't it's it's we we have a new but now we see it the second time it's two and so we see car the first time we see that the second time we see i mean the third time we see car the second time and away we go okay and so that's pretty straightforward and so it really kind of typo there so let's just get rid of that and run it with the clown stuff and we get the right data there and let's run it with intro dot txt and there we go okay and so it's it's tearing out a bunch of words and giving us a dictionary so that was a lot of work to get to this line 16 that has the dictionary in it now we want to find the the most common word and so we're going to loop through this dictionary and part of it is like once we printed this dictionary out and we verified that it's right don't worry too much about the code up here right matter of fact i can take out some of these print statements and we can kind of trust all this and so now we're going to work on this okay now we want to find the most common word now this is like a maximum loop so if you recall um we have a whole set of key value pairs communicate goes to 2 is to 2. skills is 3. so we have these key value pairs and we're going to loop through and look for the maximum now in a dictionary we can loop through the key value pairs with the following syntax for you know i would call this these variables k and v for key and value but uh yeah in the dictionaries name dot items and items is a method inside of all dictionaries that says give me the key value pairs and we need two iteration variables so this is like an assignment statement for k and v k and v take on the successive values for the keys the key and the value okay so if i just now print k comma v and i'll take this print statement out and then run the code on whoops what i forgot to oh i fell back into my python two days i need parentheses for my print so there's clown and it just prints it out and it's kind of the same thing except as pretty where we're putting each one on a line okay so the k the v is the value so we're looking for the largest value oops so the thing is we know that the values are are always numbers that are greater than one so i'm gonna i'm gonna do kind of a a quickie maximum loop largest equals negative one now in previous times we've seen that this is a bad assumption but because we know these are counters that are always positive it turns out this is not a bad not a bad idea and so i can say if the value is greater than the largest we've seen so far largest equals the value okay and when that loop is all done we can print the largest okay and so this is just a max loop and we're using this value that's the number the value is the second thing oops ah can't type python oh it's a typo yeah i'm not using value i'm using v so largest equals v let's try it again okay so we're all done with seven so these were the things that we were looking for and it was looking for the maximum and it just dutifully found seven was the largest but we also want to know what the word is and so what we can say here is we can say the word is none meaning it's it's just like we don't know what the word is and then whenever we catch this new largest number we say the word equals w so we're so i like to think of this as capture remember the word that was largest right that's what i'm doing are you remem e-m e-m remember all right all right em ber there we go um so we're gonna this this trick here is not only knowing what the largest number was but the word that was associated with the largest number so now i can print out at the end the word and the largest and that's the count okay and so we now we know that oops did we make a mistake here okay that does not look good because it says car and seven if v is greater than the largest oh it's not w i used a really bad variable see that's the whole value there there we go it's k which is the key key that's going to say that was quite the bug see what happened there i had this as w and it just happened to be it was the last word on the file car the last word in the file because i used a wrong variable right now little mistakes little mistakes the and seven okay so let's let's get rid of this print statement because we kind of know what's going on here and uh away we go and this should not work if we run it and we can even get rid of the word done here there we go the seven now the cool thing about this is this code runs just as easily with one line of code or the intro of the book intro.txt and not surprisingly that is still the most common word in the introduction.txt i seem to like that word and it's 226 times okay and so that is the basic pattern of uh reading some this is just a word loop now sometimes there would be some you know checking to see if the line is the one you're interested in maybe tearing apart the line but it's at the end of the day this idiom of starting a dictionary now it's a common problem to to know where to start the dictionary you want to accumulate the numbers for the whole file so you don't want to put it in between line six and line seven okay so i hope that particular thing helps a little bit uh helps you understand uh dictionaries hello and welcome to chapter 10. now we're going to talk about our third kind of collection called tuples but tuples are really a lot like this there's not too much to them they're really kind of reductionist version of lists they're um so they they function very much like lists and that um you know they're they have things and the difference is there are no square braces there is a parenthesis round brace or whatever and they have positions 0 1 and 2 just like a list and you can look things up x sub 2. so x sub 2 is the actual the third element here and so that prints out joseph uh you can assign you know make a tuple here this is the constant syntax for a tuple and print that out and the print statement shows you that this is a tuple not a list by showing you round parentheses and a whole bunch of functions that work with lists work the same way with tuples you can put a tuple at the end of an in statement into four as you might expect and then it iterates through the tuples tuples maintain order so it prints out one nine and two so literally this bit of code here could be identical whether it was a list or a tuple it really would do the exact same thing the difference between between tuples are that they are immutable once you create the tuple you can only sort of assign a tuple but you can't modify you can modify file list so if we take a look at a list here we make a list that's 987 and we say x sub 2 equals 6 well that just means this 7 becomes a 6 and that's just natural meaning we can reassign slots we can delete things we can insert things we can mutate them we can change them so they're changeable right they're changeable but if we try to do that same thing with a string so we say y equals abc and we know that this is position 0 1 and 2 but if we try to say let's change the c to a d by saying y of 2 equals d that is not allowed and it says it doesn't support item assignment and this little bracket you know x sub 2 is what they call item assignment inside of python and so if we do the same thing then with a three element tuple put that in z and we try to change this slot to be a zero it's going to blow up because it's the exact same thing and that has to do with the fact that once this assignment is made this is not modifiable now it turns out that the reason it's not modifiable is for efficiency they take up less storage they are quicker to access and they're really designed internally behind the scenes in ways we don't really need to understand they're just more efficient than lists if all you want to do is store list and look at it and then throw it away you probably should use a tuple instead so there's a lot of things that you can do with lists that you also can't do with tuples but they're really just a corollary of this notion of non-mutability and so like you can sort a list but you can't sort tuples you can add a 5 to the end of 3 2 1. can't do that in a tuple but you can in a list and flip the order dot dot dot dot dot so anything that you can do to a list that modifies the list not allowed for tuples and so you can take a look at the kinds of things that are inside the methods that are part of each list append count extend index insert pop all some of these many of these are modifying and then count and index are the only ones that work for uh for tuples and so tuples are limited lists now at some point there's going to be a but here to say why do we like them and um the reason that we like them is that they're just more efficient they don't have to build in it python in its own internal organization of these objects it knows that they'll never never be modified because when you make a tuple you as the programmer saying i'm never going to modify this and python won't let you do it so it's higher performance better memory use and you know to a beginning programmer that doesn't really matter but that's the reason and so we tend to use tuples when in situations where we're going to make a temporary variable and then temporarily use it just a little bit and then throw it away without really messing with them we tend to use lists to build things up etc etc etc so the other thing that's interesting about tuples and we've actually sort of seen this is that you can put a tuple that includes variables on the left side of the assignment and this takes a little getting used to but it's really cool and no other language that i know of does this so if we say x comma y that's a two tuple both have to be variables you can't put constants on this side you know it's like saying x equals four y equals fred right so what happens is you can put a tuple on the far side of an assignment statement and the four goes to x and the fred goes to y and you say what's in y well y is indeed fred and so this is like two assignment statements now the way i've got this syntax i would probably do you know two separate statements just not to show off that i know how to do tuples um you know and so you can here's another one and they just move correspondingly if you don't have two here and you do have two here um well if you have three here or 2 here and 3 here and you don't match the number there you get in some trouble now if you just say x equals tuple then that is the tuple in the list but this is just a simple straight 99 value going into a so you can put tuples as the left-hand side and you can even do things like return a tuple from functions that's a real nice python feature that i like a lot um tuples are also related to dictionaries as we've seen in the previous chapter so here we make a little dictionary we make an empty dictionary by constructing an empty dictionary stick it in d so d is sort of like this place that can hold key value pairs and we put csev and there's a 2 in there and chen wen and there's a 4 in there so we have this you know associative mapping between c7 2 and chen 1 and 4 all stuff we know and now we say hey we're going to loop through the key value pairs here and we've seen this syntax before k comma v so this is a tuple so you can think of this as each one of these things is going to get assigned into this tuple which means the key ends up in the first one's the key and the second one is value i use the variable kv all the time in code that i write just for my own sanity so kv you're going to iterate successively through the successive keys and values in that so this is going to run twice and k is going to be csev2 and chen went for the order just happened to stay the same and so if you say um what is in one of these things you can actually take d items the items method within that dictionary and say hey give me back give that to me back and then print tops and this is it's a special kind of a class but really ultimately it is a list of tuples you know this is two this is the zero and this is the two the one the first and the second and then within each thing you get you have a two tuple and so in a sense this k and v are iterating through those things when we're putting d items here and d items there one nice thing about tuples is that they're comparable they're comparable in the same way that strings are comparable meaning that they're compared from left to right with the leftmost or zero tuple being the most significant and it doesn't compare any further than it has to if they if the if it's asking less than so if it's looking at say this first tuple it starts at the left and says okay it asks the question tell me true or false is 0 less than 5. the answer is true and so the answer to this overall expression is true and it doesn't even compare those two numbers those second and third number they don't compare them if on the other hand we're asking is this less than that it only looks at the first one and asks if it can answer the question the answer is well they're both zero and so i can't answer the question so i have to go to the second one second pair and one is less than three and so that means this is true and it does not check this even though 20 million is bigger than four it doesn't matter because these are the numbers that cause the the true to happen and the same is true if uh if you do this with strings again we start the first one so jones sally well that's the same so we don't know the answer yet and so sally sam well okay s s oh they're the same a a they're the same oh l and m l is less than m so so the actual letter that makes the difference here is the l and the m and leads to us being true and so this it goes left to right but then even when it's doing strings it's going left to right that's just how string comparison uh works and um if we say say as jones uh jones sally greater than adam sam well we checked the first one and we checked the j and the a well j is greater than a and so we don't have to look at anything else we don't have to look at these any more of these characters we don't have to look at the second thing in the tuple we have to look at that is enough to be true so it only scans until it has a definitive answer it doesn't scan any further so now what we're going to do is use this comparable capability to sort these list of tuples and then bring this all back and connect it more to dictionaries [Music] so now we can take advantage of the notion of comparing tuples and use sorting and so what we're going to produce is a list of tuples and then we're going to sort them right and so we can get a list of tuples from a dictionary and then we can sort that list of tuples and then we can end up sorting dictionary items by taking this two-step process convert dictionary to a list sort the list and then and then we can have assorted dictionary values okay and so we'll do this a couple of different times so if we take a look at this code right here we have our happy little dictionary a b c map a mass to 10 b maps to one c maps to 20. like what are we going to get here well it comes out the mapping is the right way but the order is whatever and now we say this function called sorted which takes in inside a sequence and then returns us a sorted version of that a list that's sorted and so it says sort d items so it's basically going to take this list and compare the a's and the c's and the b's and because it's a dictionary and all the keys are unique there's never going to be equality so it really is going to just sort this by keys and never get to looking at the values you can't you could pre construct a list that had duplicate you could make a list of tuples that had duplicates in the first like we did before but given that this coming from a dictionary the first thing is going to always be unique and distinct and so if we say sorted d of items that we're passing this stuff into sorted sort is going to rare go around move stuff around and then give us back a sorted version sorted in ascending order based on key without looking at the value and so the so that's a way to see dictionaries sorted by key is just say sorted of d sub items and sorted as a function and so it just picks stuff and so this is the kind of loop that you're going to write to do that you know we did this before we took sorted and we got these sorted by keys and so you can just make this nice and simple for key value by the way you can eliminate the parentheses here and i think it's prettier if you eliminate the parentheses but you could put parentheses this is still a tuple without the parentheses four k and key and value in sorted so that says go through d items but before i go through them please sort them so that means k is going to go through a b and c deterministically every single time it's going to go and of course value is going to go through the corresponding values so now we can print this out nicely sorted by key and that's a real nice succinct little way to say that i mean again these are one of the kind of things that people really like about python is that you can do pretty powerful things with easy to understand i mean you know you might have seen this for the first time but ultimately you look at that eventually you'll be like oh yeah that's i see exactly what that's doing easy not not hard at all so but let's say we're looking for the most common word which we have been for weeks and weeks and weeks now and so we want to sort by values not key so this is an example of where we're going to construct a data structure we're going to imagine a data structure and then we're going to write code to construct the data structure and then that's going to make our problem easy so this is an example of using cleverly constructed data structures to do this and the data structure that we're going to create is a list of tuples where the value is first and the key is second so you can just with items get key value i want value key so let's take a look at this code take your time and get it right so kv goes in c items well that is unsorted and going to have go through whatever a b and c in whatever order and we're going to make a new list so this is a data structure that we're creating temporarily and what we're going to do is this is a list and we're going to append to that list a tuple so this is going to be a list of tuples except we're not going to append them in key value order we're going to flip them and append the first part of the tuple is going to be the value and the second part is going to be the key so we end up with this this is sort of our temporary data structure that we have constructed to make our job really easy so this ends up being 10a 22c 1b and we just kind of flipped them we took this order and then we flipped them around and so now we have this nice little list sitting in memory in a variable and that's really simple we can say oh look we can use sorted and we can sort by now the values because they're the first thing i mean the sorted doesn't know that they're how we produce this list it just looks at that and says oh that's a list of tuples i'm going to always sort by looking at the first first item in any tuple and i'm going to add reverse equals true so i get a descending sort so i see that the value that is highest ends up being first and so that changes this and i'm just sort it and then reassign it back into temp and i'll print this out and so now you see it's sorted in descending order of key so it's value key value key value key but it's sorted in descending order okay and so that's an example sort of of just like you know if i just made a data structure and i flipped those things around i could use sorted to sort these things there's many other ways you could do it but there's sort of like the more elegant way of doing it and the the clever bit here is like make a new list and make it be a little bit different okay so here we're going to print out the top 10 most common words in a file and most of this code is review so if we take a look at it we're going to open a file we're going to start a dictionary for our counting we're going to you know there's going to be words and lines right and so we're going to have a for loop this for loop is going to go through each line and then of course we're going to split them which is busting them into pieces and then we have a for loop within that and this for loop is going to go through each word and so that means that be by nesting these loops we're going through each line and then within the line we're going through a word then we go to the next line and go through the words and eventually this line of code count sub word equals counts dot get word zero plus one our idiom for making a histogram right this line right here is an idiom if you don't know already what that is go back to the previous dictionary lecture and understand it understand it because you're just going to use it over and over again so now at this point and i always like drawing horizontal lines in code when we write it at this point coming through at this point counts as right counts is the histogram it's not sorted so now we want to sort it so we're going to make a new list we're going to loop through key value and then we're going to make a tuple i'm i'm making this be two lines to make a little easier value key so i'm flipping it right so i'm flipping the order of these things that's making a tuple and then i'm appending that tuple to the list okay so at the end of this we have a list of tuples in value key order vk vk right so at this point coming through here i've got in my lst variable i've got this really useful bit of code that a useful bit of data that i produced and then i'm like oh now it's ready to be sorted poof sort so take list sort it back and sort it in descending order and then stick that back in list now we want to print it out but we don't want to print it out we don't so we got a nice sorted list coming down here we don't want to print it out uh in value key because that's what it is it's in parentheses v comma k order but it's n sorted and we know that the most the highest value is here on down and so we're going to say you know we're going to run through and now we're going to go through this new list only the first 10 start at the beginning up 2 but not including number 10 which is the first 10. for value key in and so value is good so this is the iteration variable that's going to go through each of these things on and down and then we're just going to print it out flipping it so we re-flip it flip flip we print out key value and it's going to work okay so that is one way of doing this and this slide right here you absolutely do not need to figure out but some of you will look at this slide and you're like why didn't you show us that in the beginning and others of you will be like no no no no no keep telling me this stuff here so i don't know exactly the term for this but this is a very procedural this is a classic algorithms and data structures approach to solving this problem this next thing uses what are called lambdas and they kind of create what's called what i call a closed form where you kind of do it in all one statement and there's all this implicit stuff going on so if you don't get this right away don't worry too much about that but roughly this single line does everything that bottom half of that program does i mean if you go back if we go back to here it's pretty much this line does everything does that in one line okay it doesn't create the counts and it doesn't print out the top ten but it does everything in that middle bit so let's take a look at this so we all are going to collapse this down so we have a print that parenthesis the end of the print and then we have sorted and remember that sorted takes as input a list and so that's not too bad and returns us a list and so we'll print the return from sorted and then this is the funny part the fun part funny part this is called list comprehension and we have square brackets and we say to python this is a list but instead of listing the things or having a constant one comma two comma three or a pen depend depend we're going to create an expression that will act as a generator for all the elements and so this basically says this is a list of two tuples v and k and then this is sort of implied for all k v in c dot items and so this is like a for loop that is sort of driving this think of this as like stamp stamp stamp stamp stamp however many times it has to make a stamp and so that's producing a list right it just manufactures this list and then that list is sort of manufactured in the moment there's no stock it's not put in a variable it python makes that list according to the stamping pattern that you've told it to stamp out this list and then it passes that stamped out list without even storing a variable into sorted sorted moves the list around because it is just a list of tuples and then gives us back the sorted list and so i didn't put reverse equals true on here but you see that this is sorted in ascending order now by key and i did that all in one little statement so so look at this there's a this is also one of the beautiful things about python that you can build these things and you can build more complex versions of this and and there's a lot of real elegant things that you can do in python they're really succinct you should be careful because in the beginning i think this is easier to understand even though after a while you're like wait a sec i'm why am i putting all these extra lines in because this is not so hard to understand but at some point you know you will want to master uh this more powerful and more succinct version of python that that expresses it in terms of what the data you want to see rather than the steps you want to take so this sort of finishes up tuples we've uh done a bunch of stuff i mean really they're simple and elegant tuples lists and dictionaries are all related they're really three different kind of three foundational data structures three foundational collections of python and we combine those in a lot of different ways [Music] hello and welcome to python for everybody i'm charles severance and uh and now in this little bit of lesson we're going to talk about some tuples and we're going to create a list of the most common words and find out how to sort a dictionary by the values instead of by the key we're going to use the clown.txt file and the intro.txt file and i'm going to start with the code from exercise 9 that i just did from chapter 9. it's not exactly one of the exercises but it's very similar to them and i'm going to make a copy and i'm going to keep it in the same folder i'm going to keep it in the ex09 folder and just call it ex10 because this code is going to do much of the same stuff and it's going to read these same files and so i've got myself exercise 10 like exercise 9 is still here exercise 10 is now what i'm editing exercise 10. but i'm in the exercise 9 folder so in exercise 9 we look for the the most common word but we want to find the 5 most common words which is going to require us to sort so i'm going to get rid of that code because it's not really how we're going to do it there we manually loop through it and found the maximum and so i'm going to just run this cd desktop python for everybody ex09 and if i do an ls you see that i've got ex09.py intro.txt so i'll run python3 ex10.py and run the clone data and we see that we see the dictionarian is properly uh making it in this code right here that doesn't change it reads the file reads all the lines goes through and splits it into words and then goes through the words and does the the idiom of using dictionary get to maintain the counters and we print it out at the very end so the new code we're going to write is down here okay so let's first if you do a few things um if i can say x is equal to the dictionary dot items and this gives us basically a list print x this gives us a list of the key value pairs this prints out the dictionary but if we do it this way and use items it gives us the key value pairs okay so that's what we got key value pairs now we can sort this based on the value because tuples can be compared this can be compared with this and because d is lower than r then this one is lower this whole this rand tuple comes after the down tuple so we can sort this whole thing and i'll do this by just putting the word sorted here and say give me a sorted version of that now it's going to do it based on the order of the tuples this is going to be more higher precedence than this so if i if i print it this way run it again you'll see that it's sorted and now is after and car it's in alphabetical order by key and so we could actually print the first five up to but not including five by adding a list on the slice of this list slice here and so that will show you only the first five right except that that's not we're trying to do we really want to sort by this okay so we have this mechanism that can take a list and sort it based on the tuple values if we could create a list where it was one comma after instead of after comma one and make it exact same thing then we could actually then sort it and it would be fine okay so let me show you a couple of ways at least one way to do that okay get rid of this we're going to hand construct a list and i'll just call it temp equals give me a new list temp equals new list and then four k comma v in the dictionary.items and i'll just start by printing k comma v so we see and this is where it's really nice to do these with the clown code first and then only do your test on the bigger file later and so it's pretty much the same thing we're going through in key value order which is dictionary order which is not sorted at all okay now instead of printing this out we are going to let me do this in a couple of steps make a new tuple and we'll just call it newt equals parentheses v comma k okay so this is i'm saying make a new tuple this is like a new tuple with two items in it and i'm going to make the value for and the k the the key okay so then i'm going to say temp dot append newt new tuple so i'm gonna i'm gonna end up with a list of tuples let me comment this one out and i'm gonna then when i'm done here i'm gonna print temp so if i run clown.txt you see what happens in temp it's the it's still well let's print temp twice i mean it's not sorted it's flipped let's print it that's okay we'll just that's the flip one okay so it's flipped and all we did is we made it instead of car comma three it's 3 comma car but now we have a list okay so now it's flipped and now we can sort that we can say temp equals sorted temp so it says take temp and sort it and give it back to me and now i'm going to say print sorted comma temp okay so here's the first print when we flipped it we've got two tent but it's not sorted at all but after we sorted it it's sorted by tuple and the lowest is one after so so you'll notice that one is the same as one so it checked the second item in the tuple so down comes before after fell becomes the after down intro on alphabetical order but now we get the twos so that all the all the ones sort there and then the twos come here but then within the twos it's sort in alphabetical order because like a string if it if the first character matches then it looks to the second character and then we see oh here we go the threes and then the one we actually wanted the highest one is the seven and so one of the things we can do is we can say you'll notice that we want the highest one not the lowest one so we can just tell this with this parameter reverse equals true and we just say hey sorted do this backwards do it from highest to lowest rather than lowest to highest and now our sorted one says seven the etc okay and so we want the first five we can say up to but not including five so this is now the top five so the sorted one is that's the top five if there is it's a tie we're going to go and reverse alphabetical order but let's not worry about that too much for now so it it makes a flipped list then it sorts the flipped list now if i just wanted to print it out nicer i could loop through this new list i could say 4 v comma k remember this is a flipped list so the sensible thing is what's coming up i mean coming out of this list each tuple is value comma key in temp and i'm only going to go up to 5 up through but not including five so the first five and so i'm pulling them back out as value key because that's what they are they're value key c value key value key value key so v is going to go through these and k is going to go through these and then i'm just going to print k comma v so this is kind of my flipping backwards because i i want to see them this way and that's the most common one car three and so it's just going through this up to the fifth one and then printing them out okay so let me comment this out let me comment that out let me just delete this so we have a dictionary let me comment out the dictionary we have a dictionary we make a list and we make these reversed tuples where we have the value first and the key second we're setting it up so the sort's going to work and then once it's sorted we have to flip them back so we we flip them for sorting from key value to value key for sorting we do the sort then we flip them back with key value and print them out and it works fine so let's try our big file intro.txt and there you go those are the five most common words in intro dot txt so you might ask yourself why did we use tuples we probably we could have really used lists for this but tuples are more efficient than lists and you notice that we weren't going to modify we did modify the temp list it's a list of tuples but the tuples within the list we're not we weren't going to modify and so we tend not to make lists if we can get away with using tuples and so that's why we made this uh was what this flipped tuple thing okay so i uh i hope that was useful to you uh hope to see on the net hello and welcome to chapter 11 regular expressions the fun thing about this chapter is unlike all the rest of the chapters you sort of had to really understand every single thing in chapters one through eleven built on one another one through ten built on one another but but you can really get along without using chapter 11. it's not a really required topic but it's a fun topic and an interesting topic so uh so so you can relax a little bit and realize that you may or may not like regular expressions and if you don't like them that's okay you don't have to use them you can go for your your whole life without using regular expressions the idea of a regular expression is that you you come up with a language it's a little a character-based programming language where you can uh you know do smart searching basically start searching and as you'll see in a bit with smart smart uh smart extraction and it's uh it's really almost programmable wild card expressions there's no looping but there is looping and there's all this implicit thing you say look for patterns that look like this and then you get back things that match those patterns you know we we do searching for everything we're looking through large blocks of text say go find me everything that has the word python in it or something like that so that's just such a common thing to do and regular expressions are a very structured way to go about searching for information they're very powerful but they're also very cryptic and you may not like them but they're a lot of fun actually once you understand them learning how to program them takes a while writing good regular expression programs requires some try it play with it check it try it check it try check it but once you get them there they're really quite cool it's a very old programming language um you know it comes almost from the 1960s the concept of it's a theory of computing where they were trying to come up with theory of languages and regular expressions was one form of languages that computers could understand and so it has some fun old words and um one of the advantages of knowing regular expressions is that you're kind of a cool person you can take a quick look at this uh xkcd that sort of captures the uh the devil may care awesome power that regular expressions do and and while we're at it um you know what we're talking about awesome i do want to take this moment and show you my awesome tattoos and so you may not know this but i got a couple tattoos here here's the first tattoo this is where i went to got my phd and this is my university michigan faculty member position i got phd in engineering and i teach in a school of information and library science and then i have this other tattoo and this tattoo is what i call the ring of compliance i work on learning management systems and educational technology and standards and there's this standard called learning tools interoperability which if you're using this course and doing the auto grader it uses learning tools and operability to integrate into whatever learning management system you happen to be using and one of those learning management systems is the open source learning management system that i helped write called sakai and these are the rest of the major vendors and the idea of that tattoo was that i would put the tattoo of every vendor that would comply with learning tools interoperability so you'll notice corsair i help corsair put learning tools interoperability in and so the auto graders integrate into corsair or blackboard or canvas or sakai or moodle or often those are other things so it's just like a cool techno thing just like regular expressions so i've got a url here for regular expression quick guide you might want to print this out so that you can look at it even while you're watching this lecture um because it's a little programming language except that it's character based not line based and not keyword based it has certain active characters that that the character means something versus the character represents the character itself and so the regular expressions is not part of the base python but it's distributed with python so you have to put an import re to at the top to say that's really saying pull in the regular expression library and there is a couple of functions inside that re.search which is kind of like a really smart version of the find method inside of strings and re.find all which is kind of like like taking and stamping your way through a loop through a string and finding all of the things that match a particular particular pattern then extracting those and we'll talk about both of these in this lecture so here's a really simple piece of code where i'm just going to sort of show you sort of before and after so here's a thing where we're looking for lines that begin with from colon and so we open a file we loop through the whole file we strip off the lines text and then we say if line dot find from is greater than equal to zero then we print it it gives you a negative one if it's not found and so reads all the lines and once in a while we'll print it out reads all lines once while printed out so that's kind of like a needle in the haystack to use regular expressions to do that we have to import the regular expression library these lines are the same we're going to loop through we're going to trip and how we now we're going to say if re.search the way to say this is within the library regular expressions go find the search function and search for the string from in the string line okay so this is the line to search whereas here it was more object-oriented where we say line.find here we say re.search and we pass in line as parameter these two things are equivalent which means most times gonna run and once a while hit a line it'll print that out and then it'll finish the whole thing so that is taking and doing what we would do with the find operation with regular expressions now searching with regular expressions has these special characters and so here we have the same basic code except now we're saying if line starts with from so we're not using find anymore and um that way we're only going to get that thing in the first position not like blah blah blah from colon we don't want that to match we only want to match here at the beginning of the line and so that's we use line starts with so it's going to do the same thing and find lines that have the prefix and print those out and then be done now in regular expression search we don't in a sense change the method we we have a certain number of things we can do with strings based on what they've built in but in regular expression we actually can turn this first parameter into code and so what's happening here is the carrot if you go back to a little cheat sheet carrot means this is the beginning of line it's a virtual character that matches the beginning line it's like from that starts at the beginning so from at the beginning does match and from in the middle does not match by putting that little carrot there same thing line is what we're searching and then from is what we can carry it from lining to the kit from at the beginning is what we're looking for and so again it does the exact same thing the only prince lines that have from colon is the first character in the line so the difference is we look for a method and the other one is we program the regular expression so we're going to run out of methods in the string class long before we run out of things that we can do with regular expressions and so a couple other special characters that carrot matches the beginning of the line so carrot matches the beginning of the line this capital x matches itself dot is a wild card that matches any character and then some of the characters in regular expressions modify the immediately preceding character and so that says look first a line that starts with x and then has many characters that's these two things zero or more characters followed by a colon and so you can see that it's sort of it's this sort of like expanding stamp it's like oh there's next being the line that line it looks good i got some characters here and i got a colon that's good so this is an x some characters and a colon check x some characters and a colon check x and these things you know away we go and so you can that's what's going to match and so you can see how some of these characters are special again go back to your cheat sheet some of them are special and some of them are actual characters and this colon and x are just they're they're not special they're just the characters okay now sometimes you want to be a little more clear on your match so let's take a look at these lines that that match that particular thing that we just did so we have these two x dash sieve cola next d stand result like these are from mail messages and then one of the mail messages has a line that says x dash plane is behind schedule and this matches is that what you really wanted and so what we can basically say is because this is an x this is some number of characters and that's a colon it matches it has to match that that's this rule applied to this line results in a yes it does and so how can you be a little more clear as to what you want to match and what you don't want to match so we can write code so now what we're going to say is we want to match the beginning of the line and we want to care capital x and we want a dash so now we're going to match those first two characters x dash at the beginning of the line carat x dash says first two characters of the line must be x dash now we have another special character again refer to your cheat sheet backslash capital s means a non-white space character right any character other than white space and then plus means one or more times one or more non-white space characters that's what this whole thing says one or more non-white space characters and followed by a colon which is just a character so now we have x dash followed by one or more non-white space characters followed by a colon x dash followed by one or more non-white space characters followed by a colon here we have x dash followed by one or more oops there's a space there and so this doesn't match even though there's a colon there it means that between the dash and the colon you can only have non some number of non-white space characters so this is a no it does not match and so you just can if that if you didn't want to match this you then sort of create a more precise you know we could even have a thing that said i want x dash with an uppercase character uppercase letter if you wanted to and so there's all kind of fine tuning if you sort of learn the structure that you've got to do and so that's kind of the matching where you're taking a whole line and taking this template and deciding if the template anywhere on that line matches and now we're going to do is use this to actually pull data out of strings using the regular expression library [Music] so now we're going to move from merely matching to matching and extracting so we're going to say hey i would like to not only have you take this template this little pattern the string pattern regular expression pattern run it across the line i want you to give me all the ones that match and i want a list of those and that's what we're going to use the find all so search gives a true false find all gives a list of all the strings that match so if there's four of them you'll get four things in the list if there's nothing that matches you'll get an empty list so let's take a look at what we got going here so instead of calling search we call find all we still pass in the string that we're looking through and then we have our little template pattern and this is a new bit of regular expression uh any little bracket operation square bracket is one character that's just a character but then they're in between here is a set of allowed characters so 0-9 means eight single digit zero one two three four five six seven eight or nine but that's really one character and then we have so that's one character and then the plus applies to that which means if we look at this whole thing this whole thing says one or more digits that's the code we write in a regular expression that says one or more digits and we're just going to use that in our regular expression by itself so we're going to look for any string that's one or more digits and pull it out and give it back to me so we look it's going to look so that's my little template stamp stamp stamp stamp oh got it stamp stamp stamp stamp stamp stamp stamp stamp stamp stamp oh got it stamp stamp stamp stamp got it so what we get back after we ask find all to find all of the one or more digit strings is um 2 9 and 42. so it actually parsed it it split it it found all these things and said i found them all for you and here they are 2 19 and 42. so it's a list of three strings because that's how many you found now i might have found none and we've got an empty list at that point but it found some okay so just as an example you know we did this thing we get 2 19 and 42 but if i said this that basically is a uppercase vowel a e i o or u so that's one letter and that's one or more so it's saying you know something like a a would match uh e i would match o would match but if you look now it's saying okay that's i'm looking for one or more minimum one or more uppercase o aeiou is a set of characters one or more uppercase letters and so it says like look do you find oh there's an uppercase but it's an m no no no no uppercase no uppercase no uppercase no uppercase uh found nothing did not find anything and so it gives us back an empty list and so it's like find all the things that match this and the answer is none match here's your list of nothing okay and so that's and so you have to check that's how you have to check even if you got something because it's not going to return you false it returns you a list with no items in it now the way it works like i said it sort of is taking this template and stamping it across the line scamping across the characters now there might there's a behavior that might not be intuitive you would to intuitive you at the very beginning but the notion of what we call greedy matching and that is when it can match more than one possible string overlapping string it chooses the largest overlap of the overlapping strings and so the easiest way to show this with an example and we're saying i want something that starts with an f with one or more characters and ends with a colon so that that's there that's my little stamp that's my template so it starts with an f good that's good one or more characters have a colon that's so that could be from colon that would match but look i've got another colon here and this is just continuing on with one or more characters and this so the question is do we get this or do we get this part right and the answer is with greedy matching is we get the larger of the two okay and so what you get back is somewhat counterintuitive you get the whole thing as the match from colon using the we could have got from colon but the reason it picks this is this one's longer so anytime it has a choice it picks the longer one and that's what greedy is meaning it probably better described as larger or tending toward the longest string or something like that so you can of course suppress this behavior like everything in programming regular expressions you simply add another character and so now it's going to say i would like to start with letter f as any character one or more times and then this question mark this is still one you know one little thing um non-greedy okay and so that's just says do the do it not greedy which just means that it prefers the shorter of the strings and so now it could still match this string or this string but because it's been told to not be greedy it chooses this string instead and that's the string that we get and so that's the not greedy and you just had the question mark after the asterisk so it's usually an asterisk question mark or a plus question mark though that's a two thing that's zero more characters non-greedy and that's one or more characters non-greedy um actually most of the time the uh it seems to me that the non-greedy would be the more reasonable default but that's not how it is a greedy is the default and non-greedy is optional now we can play some more with this stuff okay and so uh let's take a look at this little example where we have a non-blank characters backslash capital s one or more of those non-blank characters followed by an at sign and then again one or more non-blank characters so this is looking for strings that have an ant sign with non-blank characters on both sides this is an example of where it sort of comes to this at and it goes this way and it does it in a greedy manner if you told it to to not be greedy it would give you this these three characters but we're telling it to go greedy so it goes all the way to here and stops at this blank and then stops at this blank and so that's a nice little thing find the at signs go to the the first blank blank and pull that stuff out and so that with one little match you pulled this thing out now of course we've done that before other with other techniques so that's just another way uh to pull stuff out um now if we we get this whole thing but what if that's not exactly what we wanted we can tell um we can we can give it a matching string that's different than the extracting string by adding parentheses and so here's another example where we basically say this is our string we want to match from at the beginning followed by a space followed by ignore the parentheses for the minute uh one or more non-blank characters followed by an ant sign followed by one or more non-blank characters so this is also going to if there's no from it's not going to be looking for that right so it demands the from is here so it matches that and the space is demanded as well and then it says oh non-blank character is great i got an at sign great now blank characters oops stop there and so this is what's going to match now the key is we don't actually want that back in our extraction what we really want back in our extraction is this part right here so what we do is we put parentheses in parentheses don't are code they're they're code in the regular expression world parentheses say start your extraction and end your extraction and so when you do this with a parenthesis when you when you do it you know without a parenthesis you get you get the whole from right without a parenthesis oh wait no okay that that doesn't have the from it so um but if you do that with the parentheses the you match the from but you only get the this bit to come out as well so you can add this to make the matching part more precise but without changing what you get returned and you specify what you want to get returned with the parentheses so next i want to show you just a couple of different ways to use these new found skills [Music] [Music] so now we want to do is use some of these new found skills in some more practical applications of regular expressions so let's go back to the way we first tore apart strings and and look at the situation where we if you recall we just wanted the host name right this is an email address and we're interested in the host name so we have this string and we go find the at right the find looks up and tells us the at is at position 21. and then what we do is we say okay let's look beyond there to the space and that tells it the space is in position 31. and then we're saying we can extract starting at beyond the at sign up to but not including the space by saying at pose plus one colon space position and when we get that now we have to have a thing that decides to only look at this on from lines but then it can print out the host that is extracting of this information so that was one way that we did that right one way we did it the next way we did this was the double split pattern right so we said okay let's take this line let's break it into words based on spaces that's what words is so that's 0 1 2 3 4 5 6. and then we know that the email address on lines that start with from space is the second one so we pull out email address which pulls this bit out into email and then we're going to split that again based on the at sign so we're going to split this part again based on the at sign so it splits right there and then this becomes the 0 and 1 in pieces and then pieces sub 1 is that host and if we print that out we get the host so that's the double split pattern nice thing about that is you don't have to keep track the little plus one is kind of annoying to use the space position um that previous one that's just hard to remember it's just i've written this code way too many times in my career and i've made mistakes and i have to debug it every single time and i print all these numbers out i'm like did i get it right i did it in python i did it in java i did it in c wait a second did it differently and so it's so this is a lot cleaner i mean i can write this every time and i know it's going to work every time i barely even need to test this code because it's so obvious so double split is another way of extracting stuff but if we look at this thing with the regular expression we can say oh okay let's let's use a regular expression to do this so we'll start looking through the string we'll start so by saying hey let's look until we find an at sign then let's start extracting with the parentheses and then once we have found the add sign let's look for for for non-blank characters this is a set of characters this carrot as the first one means not a blank so that's another way to do non-blank not a set of characters which are everything but blank that's what this little bit is saying star means zero more times which means it's going to run run run run run until it finds a blank which is going to stop it the greediness is what keeps pushing it right it's this is a greedy match that asterisk is greedy because there's no question mark after it and so that does go and starts at the at sign with the parentheses goes to the space and that's the end parentheses and that's what prints out now y is going to be a list that's a one item list that has the string in it that we're looking for but you just go sub zero to get that guy right out of there okay so that's sort of the regular expression version of it but we can make this a more fine-tuned thing so we can say look we don't we also want to pick the line and we want to know if there are if we don't get that line we want to skip it if we do get the line we want to extract the data and we can do this all in a single regular expression so again we say start from the beginning of the line and if it's got to be a from followed by a space and then followed by any number of characters dot star followed by nat sign so it so this has to match we see a space then we're going to have any number of characters and then we're going to see an at sign and then we're going to start extracting and then we're going to go non-blank non-blank non-blank non-blank non-blank up blank and extracting and out that comes and this has the advantage of the previous one and that that makes it much more precise there if we look at the previous one while it works on good lines it might actually trigger on lines that we actually don't want to see so this allows us to refine it so it only actually does this to lines that we care about so it's sort of a both an if statement and a splitting extracting going on all at the same time by having a bigger string that we're matching than we're extracting it's a way to kind of clean up your data so here is a simple program that we're going to just put all this together and actually accomplish something and so we're gonna we're gonna read through and look for lines in a file that have this form and we're going to extract this number and then we're going to uh compute the the maximum of this okay so we're going to extract this number and then convert it to a float and compute the maximum so you know we're going to open a file we're going to write a for loop we're going to strip so we're going to do this for every line of the file but the first thing we want to do is not get line we want to discard all the lines except ones that have this so our regular expression is look for lines that start with x dash d spam dash confidence colon so that's a pretty strong match if that's not there we're not going to get anything and then there's a space there's a space and then start extracting and then go as long one or more digits dot and dots that's a single character and that's one or more and then stop extracting so that says start extracting da da da greedy greedy greedy greedy stop extracting and so that's what we're going to get now if the line doesn't have this it means missing in some other some way whether it's this prefix or this number if the number is missing it's going to fail too we're going to get back a list an empty list so the first thing you have to do is check to see if you actually got a match so you say if the number of items in the list len of stuff is not equal to one continue and so this is the this is the skip all the lines that don't match skip skip skip skip skip skip so there could be thousands of lines that don't match but then when this match hits it's going to come down and fall through right so so that most of the lines will skip up but then when we actually get one and we know instantly that we've got one and stuff sub zero because that's what we extracted is this number and we can take the floating point of it we append it to our list we made a list to store them that runs the list grows and then we just say what was the largest one and so you can run this and see that we have an escape character and the whole idea is sometimes all these little special characters that make a lot of sense to us we actually want to search for it so what if we want to search for a dollar sign well we just prefix it with the backslash and that just means this is a real dollar sign so backslash dollar is a real dollar sign so this says i would like a dollar sign followed by one or more digits or dots and so that's going to match a dollar sign followed by one or more digits dots are okay this is a set remember 0-9 or dot that's a set of the list of legit characters this is a range of characters that's a shortcut to how to make the set you could make it be zero one two three five seven eight nine dot or zero dash nine and it assumes that and that's one or more so then it stops because this is a space it's greedy matching then it pulls this out so that's kind of why greedy has to be the default because because otherwise if it wasn't doing greedy matching oops come back come back if it wasn't doing greedy matching it would it wasn't doing greedy matching it would stop here because it would find a dollar sign non-greedy would find a dollar sign and one character and that it would give us dollar one rather than dollar ten so in summary uh regular expressions are a cryptic but powerful language and uh they're they're an acquired taste uh i think that i bet eventually you'll find them fun even though uh on your first impression you might not think that they're so fun [Music] [Music] welcome to network programs this is chapter 12. now we're going to learn a little bit about how we talk to resources on the network using python now this is a really quick introduction to how the network really works i have a whole book that i wrote it's also translated into spanish on how the network works starting at the very lowest layer packets and everything right on up and it's actually really easy to read i wrote it for a high school audience it's a short book and pretty easy to read so if you read that book you will understand that there is this layered architecture the tcp architecture that sort of runs our network at the lowest layer that on one side here this is your computer and this is a server computer and if you sort of want a web page goes across the network does this like 15 or 20 times then it goes up into the server reads the data and then the data comes back 15 20 tops for the packets and then it's shown to you as what you see um and so that's how it works and there's these layers that we're not going to talk about in this section but i talk about in that book the layers of the link layer which talk about how to get over one hop the internet layer which talks about how to construct say 15 or so hops to get packets back and forth that's the sort of lower level bits we're going to start at what we call the transport layer and that's the layer where your computer sort of assumes that it can make a phone call to another computer another process running on a program on this computer talks to a program on this computer and then it kind of comes back okay and so we're gonna we're gonna leave this alone we're gonna ignore it we're gonna assume that there's this nice reliable pipe that's going from point a to point b and what are we gonna do with the pipe but if you're interested take a look at the book so we're just gonna start with a pipe some kind of a connection we have two processes process process and we have some kind of a connection between them and it is a connection that we can both use to talk and to listen in nerd terms we call these things sockets and that is one process running on one computer another process running on one computer another second computer connected through the internet somehow and one computer speaks into that socket and it comes out and the other computer returns something and it comes and so this is a bi-directional protocol of data which is a series of in effect data phone calls between applications so the application might be on your side this might be your browser chrome firefox internet explorer on the other side this is a web server might be internet iis internet some some from microsoft or apache or java tomcat there's another program and you are making phone calls between these programs now in general um these servers here stay up all the time and you sort of just can make a request when you feel like it on your in your program but that's what we're going to do and this is what we call a socket so that little connection that phone call that data phone call is what we call a socket now you have to decide which of the systems you're going to talk to and then which of the services on those systems or which process and so we have this concept called port numbers and they're best thought of like extensions on phones so uh one organization has one phone number and it says please enter the extension of the party you'd like to talk to well that's kind of what ports are they're like here is i'm i'm a server and i'm connected to the internet please enter the extension of the process that you would like to talk to and so for example there might be processes running on various computers and so the email is known to hang out on pros port 25 or extension 25. log in insecure login lives on port 23. insecure web lives on 80 and secure web lives on 443 and there's a couple of different protocols say if you have your mail stored on gmail and you have a local mail client say like thunderbird or apple mail that talks a protocol to pull that mail across and those live on various ports so these ports are those extensions and by convention we have standards that tell us what to roughly expect at those ports so when you're talking to port 80 you expect to talk to a web server or an http server if you're talking on port 23 you expect to talk to a telnet server and on and on and on and on and so these are the extensions the typical commonly used default extensions for various network application processes that are serving us data now sometimes you'll go to a url and you'll see in that url there's a colon and a number that means it's a web server that's running on a port other than the official 80 or 443 port now in python we can talk to these sockets right we can just talk to them and it's really easy surprisingly easy we have to import socket because that's a library it comes with python but until you can use you can't use it in your program until you say it and then you basically in the socket library you call socket function that's what that syntax is saying you're making a socket now the key to a socket it's it's sort of like a an unopened file handle it's half of a file handle it's an it's an outward looking thing that's not yet connected these parameters you're just going to type them in this says we're going to make a socket that goes across the internet and it's a stream socket which means that it's a series of characters that come one after another rather than a series of blocks of text there's another kind that's harder to deal with but we're going to do this so this don't worry about this line just know that this creates a socket but not does not associate it the very next line we get back a socket a socket object in this variable that i'm storing in the variable my sock and then when you want to make a connection across the internet to the far end you say oh hey deer socket extend yourself across the internet make the phone call to this host data.pr 4e.org and on that port 80. so that's making the phone call this is like the phone number and this is like the phone extension so that's we haven't sent any data yet we have simply rung the phone of a process hopefully living on port 80. if it's there great this might blow up this one here won't blow up but this line here we could blow up if there's nothing sitting on that process it would come back and say oh you tried to call you got no answer that's a legitimate thing to happen maybe you don't have a network connection or maybe that service is down on that server or the whole server's down but um so i just it's kind of amazing that we're sitting here in python and in three lines we have uh probably a half a million engineers who built this thing called the internet all these protocols and all this software and we just made use of it in three lines of python in a case this is one of the reasons that people absolutely love python absolutely love python so now that we have a socket we have to ask ourselves what kind of data are we going to send and then what kind of data are we going to expect to receive across that socket [Music] so now we have a socket we are going to talk about what we're going to do with it right so the socket basically functions at this level your application is saying make me a socket which is sort of this end point and then the connect actually connects to an application on the far side and there's a port involved so that might be port 80 and this this is the far host and that could be www.py4e.org or data.py4e.org okay and so the socket is solving this and and the question then is what are we going to send and what are we going to expect to get back and that's what we call the application protocol so we know that these two have made a phone call it's no different than making the phone call and saying you know hello right and uh everyone knows that when you the phone rings and you pick it up you're supposed to say hello uh and that's part of our protocol so who talks first right so the dominant protocol that we use on this in this section is the http protocol that's the key is hypertext transfer protocol it's dominant it's really easy to use that's why i use it as an example but realize that there are many others like mail and file transfer and remote login and all kinds of other protocols each is a different application protocol they all use sort of sockets at their lower level but then on top of that they layer the rules of the road for retrieving hypertext web pages and we have used these for all kinds of other things so the protocol like i said is like who answers the phone first what do they say what happens if the person doesn't answer right can you hear me now those kinds of things and it's a real simple thing and all you really need to do is so that both sides can agree you have to write a thing that's like the rules in the middle and say okay everybody as long as we all do this we'll be fine it's as simple as picking on which side of the road the cars can drive on it works fine no matter which side but if each car randomly picked it would be really kind of a mess so if you look at the typical url and this is one of the things that the web innovators in 1980 uh really invented that was wonderful and and it seems second nature today but in 1990 it was rather revolutionary in that these uniform resource locators encrypted included in themselves a protocol the host to connect to and the document to retrieve so this is one of the clever clever ideas that the web came up with because we used to have to pick a program like ftp or telnet or whatever smtp then we had to go to the right host and then we had to talk to that host a certain way so in http it's a really simple protocol invented in 1989 in 1990 by tim berners-lee and robert caillou at the world at the at cern and they created a protocol that we have grown to know and love and use for way more than retrieving documents as we'll see in the upcoming chapters so we're going to talk a little bit about what happens when you click on a page that has a link now there's all kind of fancy stuff that can go on but this is the basics and so let's just imagine for the moment you start sitting looking at a webpage doctorchuck.com page one and inside that there is a hyperlink it is a indication that says when you click on this page go to a different page and in that you see the name of the page that you're supposed to go to so we click on this link and that is a browser this is an application this is a process or an app that's running on your computer this is the browser okay and when the browser sees the click inside your computer then the browser makes a connection to port 80 on the web server doctorchuck.com and sends the request this request that it sends is precisely specified by a standard which we will see in a second then the web server does some magic work oops let's go back then the web server does some magic work in here reads some files runs some code does whatever constructs an answer to our phone call and sends it back and it sends in this case back a web page in the format of html the hypertext markup language which is different than http which is the protocol that we're exchanging html is the format of the document we're getting back and in this has an anchor tag href and the end of anchor tag and some highlighted text and now your your browser gets this back and then renders it according to the rules of html and css and javascript etc parses it and then makes a pretty web page and this web page happens to have a link back to the first page and if you click there it will do this over and over and over again and that is the request response cycle and that's governed by a series of internet standards these are standards that were built in the from the 60s 70s 80s and 90s and continue to this day by a group called the internet engineering task force or ietf the documents they produce are called rfcs which stands for request for comments the rfc the word rfc is kind of like a a sort of joke as it were it's uh it's a um they're they're trying to be kind of funny in that funny is not the right word they're it's ironic in that they're trying to say even so of the protocols of the internet that we've used for several decades they're always interested in improvements and that's what the rfc stands for and they're all named rfc dash whatever and we're going to cruise around we could find some various rfcs and this is rfc 2616. um there it might have been revised since then but this is like a document and this is what they look like hypertext transfer protocol version one and so you're reading this document you're gonna write a browser and you wanna talk the application protocol that is http this is one of many documents that helps define what http is so if you look down and look down said oh here's what a request looks like this is how i'm going to get us get a document from the server and you keep reading and you keep reading and it says um you're supposed to have the request method with a space with the request url the crest method with a space with the uri with the space the http version and the carriage return the line feed that's what it's saying and so it looks kind of like this right we say get the document followed by a space there's got to be one space you do two spaces and it's going to be quite frustrating okay and so this is an example that you can run on a number of uh on on linux operating systems and when macintosh operating systems with no changes if you install telnet on your windows box you should be able to run something like this as well so telnet is a program that we used in the old days it used to be how we logged into servers but because it doesn't encrypt your data back and forth we don't use it anymore but it basically is a program that can open a socket to a host on a port and i'm saying telnet to this host on port 80. and at this point i am connected and whatever i type on my keyboard is going to be sent to that server now if you're doing this you probably want to cut and paste this really fast because if you take too long most web servers will be like you're a human i don't talk to humans i want to talk to programs so remember to type this fast enough and then you have to hit enter twice so you have to have a blank line here just type this exactly as it's shown and then you will get back the server if you do it right the server and the server is properly configured the server will give you back some headers and this is metadata about the document you're going to get for example it's saying it's got text html which means that the remaining stuff is going to be in html hypertext markup language it has a blank line and then the actual document and then the connection is closed and so if you do this you can set this up in a way that you can run this on your own computer and in effect hack the through the back door a web server now you can't hack the secure web servers and mail servers used to be easy to hack but they're harder to hack now because they challenge you for information but part of the reason i'm so obsessed with the command line is this is how real hackers work and they know how to talk some of these protocols more directly and so we think of this beautiful sophisticated application talking to some other thing and it's all pretty and we got wonderful clicky buttons and nice usability but the reality is like in the matrix reloaded here the kinds of things that really talented hackers are doing use command lines and and they really know what's going on and that's how they do it they understand what's going on better than the developers of the computers that are trying to be resistant to the hacking so i come from a long line of using the command line and that's why i encourage you to use the command line in this course so the next thing we're going to do is we're going to go up into the application layer and instead of typing those commands by hand we're going to actually send them from python and write a very simple python web browser [Music] in this section we're going to write a web browser using python so we've already got a socket we know how to write a socket in the previous section we played with the protocol and use telnet to do it by hand and now we're going to do it in python and what you're going to find is it's not that hard so here we go so the first three lines of this program import socket make the socket remember the socket isn't really got the connection so when you make the socket again we're going to make a stream based socket and it's suitable for going across the internet the connection that it's like ring phone call connect to data.pr4e.org and port 80. and so that basically says extend the socket across and connect to a web server and so there's got to be a piece of software running and this will blow up if the software is not running okay so then now we've got a phone we've made a phone call now whether or not the remote side says hello or not is up to the application protocol and in this case the web servers say nothing and they wait for you to talk first so we're the web browser in this case and so we're going to talk first and we know what because we read the documentation we know that we're going to send get blah blah blah blah blah blah blah blah space blah blah blah blah hd one and then two new lines return return remember we had to have a blank line we'll talk a little bit about this encode it's preparing the data to go across the internet and then we say send it and so this basically takes that little string and sends it across the network and then this piece of software is waiting for it and then the software goes and reads a file or does some other stuff and then it starts sending us data back which we can then choose to receive so now i write a real simple loop we're going to receive the first we're going to receive these things 512 characters at a time so we're going to loop through receiving 512 each time and if we get zero characters that means it's end of the stream the stream is closed and if you look at the little example from the previous one you saw a connection closed when the connection is closed we get an indication that it is because we ask for some data and we get zero data otherwise if there might be more data if this will wait if the network is slow you'll see if you do a print statement in here you will see that this will pause from time to time on a really slow network if your network is fast it'll just go blank and it'll be so fast it won't matter but this is how we go so this is basically until the entire uh socket until the enti the socket is closed we're going to read this data and because this data is coming from the outside world we have to decode it before we print it and then we're all done we break out of here and we close the socket so literally that is an entire web browser written in 10 lines of python and again this is why everybody loves python so this is what this program will show if you run the get is sent it looks exactly like doing it by hand you get some headers again this is metadata that tells you something about the file in this case one of the important things is what kind of thing is coming next there's always a blank line between as a break between the headers and the actual data the metadata and the data and then here is the actual text of that romeo.txt file and then it's going to run this i'm going to print data.decode all this is coming from the print statement if you're going to parse this you have to know that you're going to read the headers up to a blank line the blank line is your indication as a software developer that the headers have stopped and the actual text begins and you know the syntax this actually could be a jpeg or png or some kind of image right and this data would here look like so if you type this and you change that code to actually talk go retrieve a jpeg url gibberish will come out okay um and so that's exactly what you will see and so now you have built a very simple web browser next i want to talk a little bit about the what happens when uh characters transition out from outside your computer i mean from inside the computer in strings out across these sockets to servers and then back [Music] so okay so now we're going to write a web browser again in python but it's going to even be shorter than what we did before we did it in 10 lines using sockets now we're going to do it in four lines with url lib so urlib really is just because the idea of opening a connection sending a get request sending the new line retrieving the stuff breaking the headers out doing all this stuff that's so common why not put it in a library to save ourselves some effort so here's how we do it we're going to read it in all right we're going to import this library so it's not part we had to import sockets before but we're going to import urlib now and so this is really quite simple it's like elegantly simple you say url lib that's a library that's a part of a libra module within the library and this is a function so let's call url open and then give it the url now that's a string which it's going to encode automatically for us so it's taken care of all kind of pretty things for us it does the get it does the encode look back at that previous code that's kind of what url is doing for us okay now what urlab also does is it makes the connection encodes the get request and then it actually retrieves at this moment it retrieves all the headers and keeps them for you for later you can get the headers but we're not going to see the headers and it returns to you an object that looks pretty much like a file handle because you can put this in the four clause after the in now it's going to read run that loop one time for every line of this file and so the lines we're going to get back are bytes and so we have to say decode it doesn't do that for us automatically we are going to have to decode them and that's because we might need to decode them with a particular character set here and then we're going to do our strip and we're going to just print this out so that's just that's like open a file read through it and print it this is open a url read through and print it and that's as simple as it is and so that's what happens this is romeo.txt and it does it prints out now the thing to notice is that there are no headers here the headers have been sort of consumed in the url open again there is a way to say hey give me my headers but for now this just going to eat the headers and keep them and then you get to read all the data and the loop runs and this loop runs four times and i'll count the four lines you can go ahead and run this one it's super easy i mean literally super easy and if you you can do anything you want i mean treat it like a file you just have to remember to do the decode bit when you treat it like a file and so we that code imported we're going to open it we're going to make a dictionary we're going to loop through we're going to split it we have to add the decode just to make sure because that line is bytes not string and then we're going to go you know our words we're going to go through the line and then each line we're going to bounce through the words the inner for loop is bouncing through the words and then we're going to go to the next line and then we make ourselves in this a dictionary and we print that dictionary out now this is this in effect other than you know importing this opening it differently and doing the decode this is exactly how we would process a file and so by using url lib you really sort of reduce the complexity of retrieving and reading network resources to the same complexity of reading and dealing with a file locally on your hard drive which is kind of pretty so one of the things then we can do is read web pages that was a text file but this you can get html and so here's how you read a web page and it's the same kind of code we open a we open a url this one happens to have html in it and we read through it and out comes the html remember that the headers are there but they've been eaten by url open for us and now we could write a browser that would parse these less thans and greater thans and make links etc etc cetera so if you can come up with ways to find these links you could actually write a bit of code that would then have a loop that would go up and open a new one pull out the links open a new one pull out the links open a new one and so you could you could make a thing that would retrieve a great program that would retrieve a pro a page find the links in the page and then retrieve those links and we'll actually do that before the end of the class and so python is a very popular language at google and i wonder if they're i'm going to i think it's a pretty safe bet that the first crawler that they wrote to crawl the web to build the index was python because literally that's all it takes to read web pages and um pull those web pages into your web crawler database so i don't know are those the first four lines ever written to google who knows so the next thing that we'll talk about is how you handle that html html is kind of yucky and nasty and so it's not as simple as regular expressions regular expressions might help string parsing and split might help but it's just too crazy so we'll talk a little bit about how to use a library to make html parsing a lot easier [Music] so now we're going to talk about what you would do with a web page once you've retrieved it in a python program call this web scraping and so web scraping or web spidering is the act of retrieving a web page extracting the links from those webpage making a queue of unretrieved links and then moving on and eventually the idea is if you had enough time energy bandwidth and storage you could find your way to most of the web pages on the internet that are pointing that point point to or are pointed to by other web pages and so you might have all kinds of reasons to scrape data you might have a blog that you uh posted you might have um who knows maybe you put some data in a system maybe uh maybe the system is being shut down because it's being uh retired you can do all kinds of things you could write a little thing i was talking to somebody who wrote a thing to retrieve something and check and then send a text when something changed all kinds of stuff or you might make yourself a search engine but be careful not all websites are happy about you using a robot to retrieve their content some of the websites as we'll see you demand that you log in and they track what you do and if they think you're doing something bad they will shut your account off other websites will track what you're doing without you logging in but then shut your address off and uh and so you have to be careful you should read up you should figure out what sites allow you to scrape them now i have some sites that i've set up that you can play with to make it so that it's a legit so parsing html is difficult you some of the simple examples you know you could probably write a regular expression or you know certainly some splitting and some whatever and what you would find is you would write that code and you'd retrieve your first five web pages and it would seem to work and then it would encounter some really weird but legitimate html or maybe even sort of slightly broken html so the web is full of broken html and your browsers just look at it go like oh wow more broken html but they don't put up error messages and so people just leave broken pages up but your python program is going to see those broken pages so what you would do is be like oh here's a new weird way to do an anchor tag i'll change my code oh and then run for another 100 pages like oh no here's a new weird way to do an anchor tag and the problem is is that you're going to find a lot of different ways to mess up an anchor tag and someone's already done that there's a software called beautiful soup and we have installation instructions on how to use it and really what it is is it's somebody just spent months figuring out all the nasty things that could happen and compensated for it and gave you a nice wrapped interface that just says look you give me the html and i'll give you back the tags okay and so it's called beautiful soup and so you have to install this there's a couple of ways that you can install this if you're good at extending your python you can just you know extend and install beautiful suit for all python programs if you can't change your your computer's configuration because you're on a school computer or you're using a usb stick or something then there's a way to download this file that i've created called bs4.zip and so what you do is you end up with your file called you know url links dot py and then a little folder called bs4 which is a folder that has a bunch of files in it from the zip file and then you can run it and so it'll pull it in and you'll import from bs4 beautiful soup and that's either going to pull it in from the folder you do or if you have installed it using the python installer it will also just you don't have to put this file in so it's up to you you can either do it one of two ways so this is a little bit of code now beautiful soup is a complex uh library and so just because this looks easy you doing things in beautiful soup you might have to actually you know read a bit more to figure it out but we're going to just read this we're going to um uh import beautiful soup we're gonna ask for url right here we're going to take that url we're going to open it url open they give the url and read the whole thing that means we're not writing a loop we've read the whole thing that's okay as long as you know that the file's not so large and then we're going to pass the data we got back and this is going to be bytes but beautiful soup knows all about bytes and all about utf-8 and it figures that out and you just say hey take that stuff i just got and tear it apart using html and give me back an object a soup object now the soup object is something that you can run queries against so it parses it it deals with all the imperfections and inconsistencies in this this html bite array and it fixes that and gives that back and so there's various things you can do and you've got to go look at the beautiful soup documentation it could be a whole class on beautiful soup so here's the thing you can do is this object you can sort of call it like a function and say hey give me back the anchor tags and anchor tags of course are the tags that say href equals blah blah blah slash a so all of this is an anchor tag and then we're going to loop through the tags because there could be more than one of those anchor tags in the file and then we're going to pull out that href and that's what this does we're going to loop through all the tags and print out the href so if you tell it to go to drchuck.com it will show you the one external link in doctorchuck.com and so i've got an assignment that sort of goes into that in some more detail but this chapter has been a whole bunch of interesting stuff we started with the tcp model and talked about sockets that are phone calls between computers and then how applications uh protocols are developed to say what we say on those phone calls and we've explored then the http protocol which is probably the most likely thing you're going to see and then we played with all this in python and saw that python is really good at this you can write extremely simple and small programs to do some extremely complex and powerful things and again that's why people like python is because it makes the complex simple [Music] so [Music] welcome to python for everybody we are going to be talking about some code if you want to download all the code it's right here uh it's all single big zip file and um all this sample code the one i'm going to talk about is urlub1.py it is not very exciting it's a short um that's what's kind of nice about python code and it's really if we go and take a look at the code we played with just previously which is socket the idea here is url lib is something that python has produced for us to make socket communications and http communications a lot better so socket what's it this is making socket calls underneath it but there's a library that makes this quite simple and so we have to do some imports so instead of importing socket we'll import these on we're going to create a handle you have request url open and just pass in a string so we're not encoding this we're not sending get command all the stuff we did in the previous sockets example is gone and then we can just put this as a for loop and so we're not using this lower level read and write code we're just using a for loop and so that literally is going to read the text line by line and the line does come back as an array of bytes so we have to do a decode but then we got a string and then we can do a strip on it so this is like a super simple uh super simple so there we go now the interesting thing is is you also don't see the headers we just read the contents now it turns out in url lib and we'll see this in later more complex application you can get the headers if you want you can get various other things so that's url lib a simple urlib tool now we can also use this in url words to to show you something quite interesting and that is if you look at this from right here other than the d code this is exactly the code we wrote to compute the words right so line and then this line.decode this is just a open something up in this case we're going to open a url or to create a dictionary we're going to loop through each of the lines in that thing we're going to decode them and then split them so once you do line.decode this is now a legitimate internal python string we split it we run through the words and run the counts and so this is exactly like code that we did before to run counts and so python3 url words and so that gives us a dictionary which is the word frequency and we could do all kinds of crazy stuff in here you know with sorting and all kinds of things the important thing is once you've done this in this the code other than the need to decode these lines when you first get them um it really works just like makes a url lib makes urls function inside python very much like files so these are short and to the point and very simple and i hope that they were useful to you [Music] hello and welcome to python objects i'm charles severance and uh we're well on our way to uh to getting through all this material in the python so this lecture is in a weird place i even debated where to put it in the book um i don't really want to teach you how to write a lot of object-oriented programming but we're going to start using objects and i want to be able to use the terminology and so as much as anything this lecture is about terminology and understanding the words things like methods and method signatures and variables and inheritance and so think of this as a terminology lecture rather than a learn how to program or learn how to use this it's not something you're going to figure out right away and there'll come a time when you as a programmer really want to start using object-oriented programming it's really a powerful and wonderful technique but i think it's too early as a beginning programmer to really say oh let's write a bunch of objects so just relax and enjoy and learn this material and think of it as sort of a theoretical thing rather than you know a how to program thing and so part of this is we're going to start reading data structures and i mean data uh on how to use all these uh libraries etc we're gonna see the word objects right and then we're gonna start hearing them and i want you to be able to read the python documentation so that you understand what's going on and so you know the word object should make sense to you even though you're not going to write a lot of object-oriented programming and so page upon page upon page database stuff which we're going to talk about soon is uh uses objects all over the place and the beautiful soup talks about uses objects we've kind of been using them and i've been waving my hands and i use the word method without defining it but now it's really time to define it and go go to it so um i want to review uh from the very beginning what we think of as a program so the classic program my favorite little minimum program is our little elevator floor converter with uh which converts from european elevator floors to united states elevator floors and the key to this is that it's input processing and output and this is a good way to model any program and in that process we've got variables and we've got logic we've got algorithms we've got loops that we write we've got all kinds of things and we construct a series of steps to achieve some goal in object oriented and frankly you've been using object all along the program has lots of objects and we're sort of putting stuff into these objects taking stuff out of one object and putting it into another object and you've actually been doing this all along as soon as you're looking at dictionaries and lists you're doing objects and so it's it an object is is quite a little thing it's sort of its own little space inside of a program that contains uh code and data and so we're working together all these objects are now working together it's a bit of self-contained code and data and it is one way to take a very complex problem and make it easier by breaking it into separate things that can be engineered and developed separately so you've been using string objects or maybe you'd use beautiful soup or something these are powerful capabilities and if you had to look at all of them it's just hey here's a thing use this object it'll do these things for you and there's lots of details inside of it just don't look at it don't worry about it and so there's boundaries that things that you can use things that you can look at and things that really you don't bother looking at you go read the documentation and use it and away it goes but then someone had to write that and so they built an object so we're going to do is look a little bit under the covers of what it takes to build some of these objects and so if we think of this program that originally just sort of did processing we can think of it as having some kind of an input right coming into our program and we have a string object a dictionary object maybe eventually some objects like a database object or an object that we eventually define and you can think of us we're receiving data it comes in an object which is a string object we start putting the strings in dictionaries and do whatever we pull out a list of them and and so you can think of data as moving between these objects and like i say even strings in the first week first lecture first week first everything we um we were using objects and we've been using them all along and so you can think of every string and every dictionary as a little program all by itself that has a bit of code and a bit of data um and so a string has the data which includes all the characters that make up the string but then there is a method called upper that'll does uppercase or r strip that strips off the right a white space from the right and so it's it's like they're almost little programs that have inputs and outputs themselves and we can make lots of them and there's lots of cooperating objects that make up an application and one of the nice things about the object-oriented pattern is that they form boundaries and within the boundary if you're inside the object you can say look i'm going to build you a string object or a database object or a beautiful soup object and i'm going to build this capability and i'm going to give it to you in the form of an interface and i'm not really going to care how you use it and so we have this sort of visibility wall where i'm going to make an object and i'm going to let you use it and the maker of the object doesn't necessarily have to know every single thing about the use of that object but so just like inside the object they don't have to worry about what you're doing with the object outside of it when you're outside the object you don't have to worry about what's going on inside of it we as the user of the object we talk to its interface and we get things from it and give things to it and use functionality within that object but we don't have to look inside of this we can just say oh it's a nice little magical thing we read the documentation we read a web page and it told us to do this this and this and away you go and so it is a it is sort of this isolation boundary that works both for the programmer who's writing the object and the programmer who's using the object and so it's a it's a very nice pattern um and so you'll see how we're going to build code and we're going to group it together and then we're going to be using it sort of as a big blob of stuff so some definitions in this space words that i want you to understand um when we're going to create one of these things one of these objects instances that has some data okay so now that we've gotten through the definitions let's work into some sample code but hey look at this we've got ourselves a cookie cutter and some cookies so remember that a class is a template it's not the actual thing an object is an instance of a class so you have to take the class and do something to make the object and actually you can see here some other classes there's clearly a sort of a snowflake class and a gingerbread man class that's an object object object somewhere out here there is a snowflake class and a gingerbread class but we got a snowman object and a snowman object and a snowman class so class is the template object is the instance so here's a bit of python code so let's take a look at what we got here class is a new reserved word kind of like def we have the name of the class that is a name that we choose we're gonna that's the name by which we'll refer to this class for the rest of this program and it has a colon at the end of it and which means it starts an indented block which ends when we de-indent inside the class there are generally two things there is some data and this just looks like an assignment state now i'm going to talk a little bit about object life cycle and what we mean by object life cycle is the act of creating and destroying these objects and i've been using this term constructor already and so when we declare a variable whether it's a string or a dictionary or a party animal where we create them and then they're discarded and there's all this dynamic memory that comes and goes and we as the writers of objects have the ability to insert ourselves at the moment of object creation and at the moment of object destruction and we make special functions that we call the constructor the object constructor or the class constructor and the destructor and we don't actually explicitly call them they're called automatically by the uh by python on our behalf and so the constructor is uh much more commonly used it's used to set up any initial values of variables if necessary uh etc destructors are we'll cover them but they're they're used very rarely so here's a bit of code that we've got it's our party animal and a lot of it is the same as what we've been doing so far so we have this variable x and the constructor has a special name underscore underscore init underscore again we pass in the instance of the object self and in this one all we're going to do is print out that you're constructed and here's this code that we've had before and now we have underscore underscore dell and then we pass in self and we'll just print out that we're being destructed and what the current value of x is for that particular instance so let's go ahead and run this um and so again this doesn't really do any code up to here that just defines party animal but this is the constructing of it and basically that says oh and it really kind of creates these variables and then it also runs the constructor and so in this case this line right here is causing the i am constructed message to come out then we do and party a and party and that says you know one and two and here's an interesting thing we're actually going to destroy this variable by throwing away and no longer points at that object and's going to point to 42. so we're going to sort of overwrite a n and put 42 in it and at that point python's like oh this whole little object that i just created somewhere it's out here it's vaporizing it and throwing it away and so before this line completes it actually calls our destructor on our behalf and so that message comes out so we are allowed as the builder of these objects to add these little chunks of code that says i want to be involved at the moment this object is created and i want to be involved at the moment that this object is destroyed now in this last line a n is no longer a party animal a n is now an integer it's got a 42 in it it's gone it's been created it was used and then it was destroyed okay so you've got to be careful if you overwrite something you kind of sort of throw the object away so the constructor is a special block of code that's called when the object is created to set the object up so we can create lots of instances everything we've done so far is we make a class and then we create one instance one object and each of these objects ends up being stored in its own variable we have a variable a n and we've been using it but the more interesting thing begins to happen when we have multiple instances of the same class sitting in different variables and it has its own copy of the instance variables so let's take a look at this so this code here i've taken out the just i've taken out the destructor and it shows a little bit more information so now we're going to put two variables in here we're going to have a current score or whatever and a name and we're going to start it out as blank and this time we're going to add a parameter onto the constructor and so the self comes in sort of automatically as the object is being constructed but if we put a parameter on the constructor call which is this party animal call then this comes in as the z variable and so self is the object itself and z this first parameter is whatever parameter we put here everything we've done so far has no parameter here but now we have a parameter here and then that means that when we call this constructor this line of code comes and then name is no longer blank name is going to be sally in this particular thing and then i'll say oh self.name which will be sally was been constructed and so then then we have this and that object is now constructed and we put it in the variable s and then we call the party method on that and we construct a different one and so this time it calls and z is jim and we basically have a oops another copy of this and so this is how it's going to look right as as it runs down here as it runs down here when this is called it makes one instance and stores that in the variable s and there's a variable x in there there's a name in there there's an init method in party and that's all in here right all that stuff is in here and now we say let's make and that's going to have a sally in there find sally in there and then we're going to do another constructor and so it's going to make a whole new thing and it's going to store that in j and this one's going to have jim in it on s party then this turns into a one and then we're gonna call j party that turns that into a one and then s party will cause this to be a two okay and so what happens is we have now two objects one in the variable s and one in the variable j and they have separate copies of their instance variables these are the instance variables or the object fields or whatever but they're the variables but the key is is that every time we do a new construction it duplicates this and there's another copy of it so there's an x within s so s dot x is this variable and j dot x is that variable okay so the next thing we'll talk about is inheritance and that's the idea of taking one class and extending it to make something new so the last topic we'll talk talk about here in object orientation is the notion of inheritance and this is a form of code reuse and it's one of the more advanced aspects of object-oriented programming so just kind of understand what it is at a high level and then you know where to come back to when you need to learn a bit more about inheritance so the idea is instead of making a new class from scratch we actually make a new class by starting with an existing class we are extending it or another word for this is subclassing and it's sort of a situation where you're like i've got this code and i've got this data and i just need to add a few things to it and then i'll have a whole new thing and as you design objects and what we call object hierarchies you often do this and it's a form of sort of real clever code reuse um but again don't necessarily think that you're supposed to know when to use this or why to use this is right now it's just terminology okay just terminology we have what call these as parent child relationships the original class is called a parent and the new class is called the child class so subclasses are another word for this you have a class and then you subclass it i think extending inheriting and parent child are probably better ways of expressing it than subclassing so here's a bit of code let's take a look at this um this is this code's unchanged it's the party animal code that we've been saying all along um it's the one that you we we construct and put a name in and now what we're going to do is extend it and so you'll notice that this code down here is the part that's doing the extending so we're making a new class football fan and by putting in parentheses before the colon party animal that says football fan inherits everything that is party animal meaning the x the name the init the party all those methods and data are sitting there and now we're going to add a new variable so football fan has in addition to all those other variables has points and it has a touchdown method and you know point uh self points is added you know to we add seven of the points and then we call the party and when that does that so this is calling this method because football fan includes x name and party and init and everything and all this stuff this constructor so so this football fan is really an amalgamation of all these things together party animal is just this stuff right but and so we still have two classes we don't just have one we didn't erase the party animal class and so if we take a look at the code that we can run here we can say oh okay let's make a party animal sally and so that constructs a an object like this and then stores that in s and um with an x starting out zero and and then we call s party oops better change that color um starts out at zero and then we call the party method and that changes it to one okay and so this is this bit of code it's as if this part doesn't matter at all because it is a party animal it's not a football fan but now if we take a look at this code down here take this code down here we're going to construct a football fan and pass in jim but football fan has no underscore underscore in it so that actually uses the underscore knit from party animal because we extended party animal to make football fans so we inherited all of the good that was in there so there it's going to make a name a variable x which is going to start at 0 a variable name is going to have jim in it and a variable points it's going to have a 0 in it so this j variable has more things in it than the s variable has and so we can call the j party and if we call j party that goes here and adds one to x all right so that adds one to x and then we call j touchdown well that comes down in here and adds seven to the points right and then calls party within us and so so self.party is the current object ie self and j are the same thing right self.party and then it goes up here and passes self in and it adds one to the x in this case of this j variable so this becomes two and that's where it prints out it prints out you know seven and two and away you go and so it's a way for you to kind of take all this stuff and stuff it into an a class by making a new class and just add the extending bits the bits that are in addition to the other stuff so like i said inheritance is a powerful and wonderful concept it's a form of excellent form of reuse but basically the whole purpose of this lecture was so that i could in the future just use these words and you would understand them as compared to i just want to say method and i've been saying method all along this high time that i defined it so let's just review one last time class is a template it is not actually a thing it is a shape of a thing and we define it and say when we make one of these things it's going to have these variables and it's going to have these methods in it attributes variables within a class a method is a function that's inside of a class object is once we construct a class we get back an object and so object here is the snowmen cookies class is the snowman cookie cutter and a constructor is a bit of code that sets up our object our instance when it first is created an inheritance is this ability to create a new class but take all and import and affect all the capabilities of an existing class so oriented is awesome for the rest of this class we're not going to write any object code we're not going to use class at all but we're going to use objects and literally you've been using objects from the beginning of this course as soon as you said print whoops as you as soon as you said you know x equals hi that's an object and as soon as you said x dot upper you were calling a method right you've been calling a method all along when you're doing something like f h equals open this thing you're getting back that's an object and then you do fh dot read or whatever you're calling a method in the dot operator so you've been using objects all along i know i'm just finally explaining to you when i say call the read method or call the upper method or what's this little dot and why is that there so again it's time for us to understand that but you will it will take you a long time before you encounter a problem that's large enough where as part of your solution you're going to make a new object but when you do it's really a powerful thing i mean it's a really bad idea for me as a teacher say oh write a bunch of objects it's like it's it's premature for that it's later is when um you will actually learn how to use objects and you'll be like oh thank heaven that these objects are here okay so uh that's all for now uh thanks for listening see on the net [Music] so now we're going to take a look at how we deal with more than one table multiple tables because the real power of sql and the power of database performance has to do with when you start connecting tables together if you go back to that original mathematics it models data at the intersections between the row and the columns and these intersections are the magical bits and so breaking an application to use multiple tables is an art form it takes a while there are some simple basic things that you can learn and we'll teach you here and so it's not too hard to learn the basics but then it's much more complex to be super uh skilled at it and and in general advanced databases in my mind it's hard to teach advanced databases because they're always so contextually grounded uh you know something like a twitter or uh or google the databases are so specialized by the time you make everyone can do small to medium-sized databases using the basic techniques but at some point once you escape medium-sized databases you end up in these sort of narrow things and optimize each database very separately and so i just tell people you know learn the basics really really well write programs and then go do real work but database design is the act of figuring out the data that your application is going to want to store and spreading that across multiple tools but we don't just do it randomly we do it very much cleverly and if you look at a data model this is what it looks like and what we're showing here in this data model is we are showing uh five tables and this is a a kind of a calendar kind of a system and we're seeing the the columns that are in each of the tables and then we're seeing the relationships between the tables and even in these relationships there's kind of a little bit of code and when you have an arrow that looks like that there's many of those to one and this is a many to one relationship many to one relationship we'll talk all about that stuff but if you go into an organization and you have a really large and complex data application they might have something printed out on the wall that looks about like this which shows the database tables and connections etc etc and they might say oh your job is to go down and then this little corner add one column field there and then do this and then connect it with this thing over there and then make a a screen that shows all these things that pulls from this table this table this table in that table and that's your job if you're a programmer on a large software development project these database models become sort of like the core backbone of the knowledge that applications are uh managing and using so the idea is is that you take your application we're going to start really simple we're going to take your application and you have to draw a picture and the basic rule and literally you could spend course upon course learning about database normalization but i'm going to i'm going to distill it into one basic rule and that is never put the strength same string data in twice so my name charles severance if i build a database well you should go into that database and you'd say okay the words charles severance which is the name of a person me in that database only shows up once and what we do instead is we connect things together and model my name as a connection to the record that has my actual name in it rather than putting my name all these other places so the idea is to pull duplicate data out and make only one copy of it so there is the there is the users and then there's the user's name and the user name shows up only here and everything else points to the particular user entry so that's the idea and so here is our first application we are working as a startup we just quit all of our jobs and we are going to build a music management application i mean what a great idea don't you think that'll be quite successful and so we have mocked up and we have figured out that this is what our music management application we want to track people's tracks know something about what artists and albums and genre they are and have ratings and how many times we've played them and how long they are well that's that's the data that our application needs to represent and we've done testing on this and and wireframes and everyone loves this a great user interface and so this is how it's got to look but we're going to have billions and billions of tracks in these things and so we want to come up with an efficient database to handle this and so we're going to take a look at this and look at each of the columns and we're going to ask ourselves is this column part of one of our existing objects our existing tables or is this gotta this object have to create a new table and then once we've defined those different objects we connect the tables together and model the connections now a little trick to kind of make it a little easier on ourselves is we can look in these columns and look in the columns that have duplicate information vertically that string information so rating is just a number like zero through five so we don't worry too much about integers and numbers and that kind of stuff or or whatever but we do look for strings and the problem here is we got like these strings occur many times and so these are the problems and so we we have to put these things where there is replication of string data kind of in the vertical dimension we have to put those in different tables and so we'll start out now the first question that you have to ask yourself when you're going to draw this picture of how this data is in multiple tables and connect it together is what is the first one that you're going to write down and this is an interesting debate and often people are sitting in a conference room and people who have experience kind of know what to do usually if it's a multi-user system like a learning management system uh the users might be the central concept perhaps the courses might be the central concept this is a single user system and so you can think well what is really this application about it's not about people it's one person but it is about tracks and so we can say okay here it will take the the track is probably the sort of most foundational notion of this application and then we can take and say okay now that we've decided that tracks are the foundational notion which of these columns are simply an attribute of the track not really and the cheap the cheating way in the easy way and this particular one is like these numbers all these numbers like this number and these numbers not that one they just go along with track and so we'll put that in we've got the track title rating length and count and we put that in and then the question is we've got the remaining things art we've got the artist we've got the album and we've got the genre and so we can say okay well we can't we've got some vertical duplication so we're going to say okay this track probably belongs to an album so let's pull out the album into its own table oops pull the album out into its own table and so pull the album out into its own table and so that pulls that out and then you say okay what would be the next thing that we're going to pull out so we pulled out the track we've got this taken care of just taking care of that taken now we've got the album well albums belong to artists so let's take out the artist and then we'll pick where the genre belongs and we'll just say that the genre belongs to the track and so because there might be albums with more than one different genre so each album is not necessarily a rock album it could have a rock track and a country track etc etc etc and so now what we've got is we've got four tables right we've got a track table we've got an album table an artist table and a genre table and if we sort of double check all of the columns that had vertical duplication in them now have their own little table so we can we can eliminate the next thing we'll do is to show how we're going to eliminate this vertical data vertical data replication um by showing how you represent these relationships that we just created inside of the database [Music] [Music] so now we're going to make a database we're going to use sqlite browser hopefully you've downloaded it so you can follow along and i've got this handout this basic database handout that saves you from having to type all these things so bring that up in your web browser and so that gives you all of the commands that i'm going to type now um and so you could pull them out of the either the webpage of the um you can pull them out of the slides or you can pull them out of that out of that so i'm going to bring up the database browser here database browser now the thing that's going to happen you'll see this happen on my desktop i'm going to make a new database and you have to store it somewhere and so i'm going to put it on my desktop and i'm going to call it py4e fun and so we should see a new file on my database right there py4efund now that's a file that you don't want to edit with a text editor or anything like that this is um a database that you're this this is a file that's to be read by sqlite browser and nothing else okay so we're going to create a table and i'm going to make a table called users with a column called name that's a text and a column called emails so i'm it's already asking me to make a table i'm going to call this users and i'm going to add a field that is called name and i'm going to add a text and i'm going to add another field called email and i'm going to make that be text now the key thing here is as we are in effect making columns and rendering an opinion as to exactly what the column is supposed to be used for and we're not allowed to violate that it's not like oh we'll do whatever you want because the database is optimizing its storage based on our in a contract that we're effectively making contract ourselves we could make these columns anything we wanted but we're just going to we have to we're going to contract with ourselves and you can see it's kind of small here you can see there's a create table and that's on the slide and that's the the the sql way of doing that this user interface is just helping us write sql so now i'm going to just say okay and if you take a look you can see that i now have a table users and i can look at my database structure the table users and away we go and so so now that's that is creating it and like i said here in the slides is the create statement or um on the web page there's the create statement that could have done it now we can insert some data um let's add a new record to this database users and we'll call this guy uh name charles c sev at umish.edu so now we have a record so it's kind of like a database a spreadsheet now that's not the sql way to do it there's sql sorted going on in the background but if we really want to do this using sql we're going to use the insert statement and the insert statement looks like this the sql syntax sometimes has extra words insert into is actually an s2 sql keywords the name of table the columns and then the word values and then one to one correspondence between the values and in parentheses so it looks kind of like a a tuple in uh python but we're nowhere near python right now okay and so that's what we're going to do and so i'm going to grab this kristin and i'm going to go over here to my sqlite browser and say execute sql so now i can say paste that in and then hit this little run button and that's going to submit the sql to sqlite and then update that file and it says query executed successfully and away we go so if i go back now and i look at the data i see that there's two two things in here and now i can actually insert all the rest of these let's go back to my little bit of stuff here let's put all these other rows in it turns out that if i go into the execute sql and i want to do more than one more than one command at a time i can put a semicolon at the end of each one of these things and then i can run them all for that all the same time i mean one after another actually is what's going on here so boom boom boom and i take a look at the data and look i've got all those things in there now eventually the thing that's going to generate that sql is a program not us this is we're being the database administrator so we're sort of doing things manually once things get going you write programs do that insert over and over and over again in python or a web language like php or something like that and so that is the insert now we can get rid of data and so i'm going to say delete from that's the keyword users is the name of a table where is a where clause will have lots of where clauses in sql which is it's not like an if it in effect the delete is going towards the whole table and being turned on and off by this where clause so delete from users if you didn't put the where clause on will actually delete all the rows but where ted equals email equals ted at umich.edu well that one is going to make it so it only applies to those to the rows that where that is true so i'm gonna go over here in sql and i'm going to say delete from users where email equals ted at umich.edu and then i'm going to run it because it's only one i don't need a semicolon at the end of it and now if i go back and i look at the data ted is gone okay update so the update says updates keyword users is the name of the table set is a keyword and then this is column equals new value and then a where clause again this update if we didn't have a where clause would change every row in the table and so where email equals csev umich.edu oh i gotta change that because i already got the name to be charles so you see the name is already charles so i'll just execute here make this be chuck so we see it and then i run it and then you take a look at the data and it's changed that's it that's an update statement we're doing you're doing great you're doing great and so um the next thing we're going to do is we're going to take a look at how we retrieve data now this is the select statement select star you have a list of columns and star means all columns from is a keyword and then the name of a table so this select star from users is the kind of thing you type all the time as a matter of fact it's what sql browser is doing internally to cause this to happen but we can do it by hand by saying select star from users and then run it and so then we get a little record set that is those four records that are sitting there we can also throw a where clause on the end of it so we say select star from users where email equals csev at umich.edu and that again the select star from users goes at the whole table and the where clause goes at the whole table and then filters out all of the things except one record so the where clause is send it to the table but then filter based on on whatever and so it it only shows us that okay we're cruising right along here you can also put an order by clause on there so we can say select star from users order by email so that's a column select star from users order by email and so that orders by email or we can change it by to name and we can say descending so that's the name and descending order sorting and selecting are good things that the databases are really good at so this is the summary of what i've told you i said the databases do create read update and delete crud and we've done all those things except we did create delete update read that's what we did and that's the summary of sql and so you might be saying why did i take so long to learn such a simple and elegant and beautiful language because it's not really exciting it's a extremely simple language that's a very predictable and you're like that's pretty easy and it turns out that some of you may have been using sql in situations maybe with microsoft access or something or actually type in this stuff and you you just kind of typed it you never realized that you were learning a programming language that's why i like sql and that's a very declarative language and it's very straightforward it's much harder to learn and i mean it's much easier to learn um sql than it is to learn python because in python you have to figure out how loops work and how iteration variables work and you'll notice there's none of that and so the but the key is is we've only started to understand the power that that's the simple ability to move around and update data and read data uh randomly using using uh these simple sets of commands but up up next we're going to look at how you do this with data models and relationships and really multiple tables [Music] hello and welcome to our chapter on databases we're going to learn a lot in this chapter uh learn a whole new programming language sql and learn how to use that so you're going to need a new piece of software to run all of the exercises that i'm going to do called sqlite browser we're using a database called sqlite go ahead and download this you might have to pause and come back if you like go to sqlitebrowser.org and download it and install it while you're doing that uh we'll talk a little bit about the history so in the old days 1960s 1970s i started doing computing in 1975. um we didn't have a lot of storage i mean this is you know 16 gigabytes right here and you know we didn't even have megabytes i mean the computer i had had a few megabytes of stuff so it well so we didn't have a lot of disk drives and so permanent storage uh was often sequential and these tapes these tape drives that we had tapes and tape drives were the scalable part of storage because you could just make more tapes and you could rack them up and so that was our way of greatly increasing the storage of the computer the problem they had was is they were sequential you read it advances read it advance read in advance now interestingly we've been writing programs that do this that everything we've written so far pretty much reads the whole file reads the whole web page reads this everything we read it we read it either a loop or read the whole thing and that's because we have plenty of memory but we're still reading sequentially and um and so the way you would do this when you didn't have enough spinning storage or online storage is you'd use offline storage but the trick would be that you would sort it so let's imagine that you're a bank and you have a bunch of accounts only a few of which are active on any day and you have a tape that has in account number order from low to high the the prior balance last night's balance of every one of your bank accounts and then you do all the transactions and you record how much money was taken in or out for each account number and then you sort those transactions and then what you do is what we call the sequential master update and that is you would write a program that would read the first transaction and hold on to it say okay this is count 45. then i would read the first count like one and it would copy one and then every two and read like seven eight 42 43 then we'd read like 44 and then we'd read 45 but it would now it would change that and write the new 45 and read the next thing and so this might be 60 and it would read a bunch of stuff and copy a bunch of stuff and then would finally get to 60 and it would merge the add or subtract and so the the old balance ended up here and the new balance did here and you had to only make one pass through the data so it was super efficient so we had all these mechanisms to sort we used to do punch cards and have sorters and all these things and then those things would run for hours and if you watch old tv shows these tapes are spinning and these things are running back and forth these are simply reading and writing tapes and that's how we did a lot of data processing because we could store far more on a tape drive than we could on a disk and with a racks of tape drives we could scale the storage that our computers had and so that's the way we did data processing but it meant that you the only way you knew what the old balance was was it was the balance as of this morning before you bank started you don't know what the balance was for the day and that led to things like you can never retreat uh return uh you can never withdraw more than a hundred dollars a day or something like that because you you don't know what the old balance was or you might go withdraw a hundred dollars at a couple of different branches and and so they they didn't they weren't able to look your stuff up right away now it didn't take long until the disk drives got better and better and better and you could store the entire accounts all the accounts and their current balances on computers and then the problem becomes is what happens if sort of in the middle of the afternoon you want to update a balance well do you want to read all your data and then write a brand new one and that's say that takes like 10 minutes that means for that 10 minutes only one person can be updating their bank balance and so because we could randomly access this data we didn't have to read it all sequentially the trick was is how do you spread the data out and then how do you make it so you can change a balance this is of course second nature today but how do you make it so you change the balance here without changing the balance there and you can have multiple people going simultaneously to these things and make sure that you can't say withdraw money at two different locations simultaneously and somehow have your bank balance get corrupted by that so there's a lot of debate on how to do that and in early days we just did sequential master update but increasingly we wanted to make better use of the random nature of our computers and our storage and so that's what led to databases databases are the science of how you make use of rotating random access data permanent data in a way that allows you to read modify and update that simultaneously from many different locations and yet keep the data completely consistent and so this led to a study of a thing called relational databases and there's relational databases are not the only databases that that happened we had many other kinds of databases and there was a debate and i remember in the 70s and the 80s there was a folks that says oh no no there you could do index sequential that's the way to do it and relational databases weren't popular weren't all that popular the first time that uh that i saw them i i didn't like relational databases but relational databases had an inherent advantage because they were based on some really powerful mathematics the interesting thing is early on the relational databases were slower but eventually they figured out how to sort of bring all the cleverness to bear to make relational databases fast and so relational databases are a pretty advanced technology and there are companies like oracle that are very very wealthy and their primary product for many many years was nothing more than a clever database product a clever piece of software that was really good at solving this problem and that's how important this problem was to computing if you read about databases you're going to see two sets of terminology one set of terminology comes from the mathematical background and um has to do with the underlying math things like relations tuples and attributes that's kind of like the fancy math version of it and uh programmers kind of think of them as rows and columns inside of a table and so if you look at sort of fancy theory you'll see words that look like this and they're just full of this and the connection now all this is important and true and if you really want to get good you sort of begin to understand the nature that we model data at connections rather than at sort of intersection points rather than just modeling data as a flat file the way we do but for now we're we're going to as programmers think of this as just like oh it's like a super fast spreadsheet the super fast part is the math for us the rows columns and tables are spreadsheets so it thinks of think in a spreadsheet of sheets sheet sheet sheet and that's like a table a named thing like tracks or albums artists or genres and then there is rows and each row has a different kind of data and then there's columns and we sort of specialize the first column in many spreadsheets to say what's in there this is not really the data this is like metadata it's like the title's in this first column that's not really the data and the data starts here and we have different kinds of data like strings and numbers etc etc for each of the rows and literally you can get away with this as sort of about 80 of databases is just a really super cool spreadsheet but under the covers it is far more powerful than that so one of the early arguments that uh happened was again what the programming model for this was and a lot of folks wanted a programming model that reflected how the data was actually stored um the notion of structured query language came about in a way to express what you wanted to happen and allow that to be sort of a very abstract expression select all records that meet this criteria not read read read read read read and so structured query language is a not a procedural language it is a it is a imperative language where you're simply saying what you want and then somebody writes the loop the database actually does the loop but it's a way for you to avoid actually writing the loop now that turns out to be the power of databases because the cleverness and how to write the loop is a way that you would probably never figure out how to be most super supremely optimal when it comes to rating writing the loop as you'll see toward the end of joining many tables together and selecting and throwing array and getting down a count or whatever someone has figured out how to do that really really well so the idea was is you would express you know we're going to create some data we're going to retrieve some data we're going to insert and delete it create read crud c-r-u-d um create read update and delete crud and so that's what this does it's a language that does this very simply now the applications that we're going to use this for are more of a data analysis application we've been doing data analysis for through the whole course and the kinds of things that we'll see in the remaining chapters is we'll take some raw data file these might actually come across the network and we'll write some python programs to play with that data parse it clean it up make sense of it you know and then write it into a database and this might be a slow processor this might be really nasty and this might be a way to have very clean data and then we'll write another python program to sort of read this read through it and it's all efficient and pretty and then we can produce files and maybe we'll visualize it or do work further analysis in our excel or or a javascript visualization framework and so in this situation you will be the person who is both sort of writing the programs database administrator and you can using sqlite browser play and look at the database kind of in a raw way and the first part of this we are mostly going to be using sqlite browser just to talk straight to a database later we'll write python programs that read and write data and and visualize the data so so this is what we're going to do first and then second we're going to do this part right here that's the second thing we're going to do now another really common use of applications and something that if you continue learning more about programming is that you will want to write a an online application like amazon or a company or or twitter that's got a website and it stores dynamic data in databases and so the picture for that is similar but different than the picture we're going to start out with and so the way this usually works is that you the end user uses a web browser talks to the application and the developer writes the application software and that application software stores its data in a database and inside that database we talked to the database using sql and all the data is actually stored here and the magic happens the data server is that database software that's so precious and valuable and then there's another person often called the database administrator who has access to the direct access to the data and these roles in medium and large projects are kept separate mostly because the mostly because the the production while it's running in live the developer leaves the data alone and works on say the next version of the software and then the developer has a test version of the application that they run on their computer where they're doing all that stuff and so this database administrator is a is a role in a large project where we have to run production and keep production careful keep production in good shape so the database administrator has this responsibility for the production aspects of the data and you may be working in a situation where that you're not actually controlling the data the database servers on different computers you have little special access and you write programs to sort of read the data and so the database administrator is the person who is asked by the organization to administer that data the data that we develop and we'll do this in the second part of these lectures um conforms to a data model that's the metadata is this an integer is this a string you know how many columns is this and the data model turns out to be very very important there's a lot of science to building an effective data model that leads to really good performance and it's a it's a collaborative activity between the application developers and the database administrator to make it so it's efficient runs in production etc etc etc there's a lot of products out there that you may encounter we're going to be using sqlite sqlite's a little tiny database server and it's built into so many things and that's why we like it but if you're going to work at a large organization you can easily run into oracle which is the number one commercial product microsoft has a thing called sql server which is a commercial product and it's also very popular and very effective uh the more popular open source uh there's things called postgres there's mysql and mysql recently was sort of bought by oracle and there is a a copy of that called mariahdb that doesn't belong to oracle mariahdb and so you did most of the sql that we're going to learn is common across these database because database systems because sql is a standard but then there are parts that weren't part of the original standard where each data database vendor has done things a little bit different but there is a core common subset that does the basic create read update and delete operations so sqlite is a very popular you probably have it in your cell phone 10 12 times your web browser has a database engine in it your car has a few databases in it and so sqlite is what's called an embedded database system python comes built in with with it you just import sqlite3 and away you go and so it's a very very popular because it's free it's open source and it is such a tiny little piece of software that you just included in other pieces of software and use it to solve the data management problems of those pieces of software like your browser might use sqlite to store your bookmarks now you think oh there's only how many bookmarks can you have but what if there you need it to be fast and what if there's like people that have 10 000 bookmarks there probably are do you still want it fast do you want to be able to search and so you get all that by using a database like sqlite and so again we're going to encourage you to download the sqlite browser so you can follow along with what we're going to do coming up next and so here is the sqli browser here's what it looks like and it's just a desktop application and uh coming up next we'll start playing with this desktop application and see how it works [Music] [Music] hello and welcome to python objects i'm charles severance and uh we're well on our way to uh to getting through all this material in the python so this lecture is in a weird place i even debated where to put it in the book i don't really want to teach you how to write a lot of object-oriented programming but we're going to start using objects and i want to be able to use the terminology and so as much as anything this lecture is about terminology and understanding the words things like methods and method signatures and variables and inheritance and so think of this as a terminology lecture rather than learn how to program or learn how to use this it's not something you're going to figure out right away and there'll come a time when you as a programmer really want to start using object-oriented programming it's really a powerful and wonderful technique but i think it's too early as a beginning programmer to really say oh let's write a bunch of objects so just relax and enjoy and learn this material and think of it as sort of a a theoretical thing rather than you know a how to program things and so part of this is we're going to start reading data structures and i mean data on how to use all these libraries etc we're going to see the word objects right and then we're going to start hearing them and i want you to be able to read the python documentation so that you understand what's going on and so you know the word object should make sense to you even though you're not going to write a lot of object-oriented programming and so page upon page upon page uh database stuff which we're going to talk about soon is uh uses objects all over the place and the beautiful soup talks about uses objects we've kind of been using them and i've been waving my hands and i use the word method without defining it but now it's really time to define it and go go to it so um i want to review uh from the very beginning what we think of as a program so the classic program my favorite little minimum program is our little elevator floor converter with uh which converts from european elevator floors to united states elevator floors and the key to this is that it's input processing and output and this is a good way to model any program and in that process we've got variables and we've got logic we've got algorithms we've got loops that we write we've got all kinds of things and we construct a series of steps to achieve some goal in object oriented and frankly you've been using object ordnance all along the program has lots of objects and we're sort of putting stuff into these objects taking stuff out of one object and putting it into another object and you've actually been doing this all along as soon as you're looking at dictionaries and lists you're doing objects and so it's it an object is is quite a little thing it's sort of its own little space inside of a program that contains code and data and so we're working together all these objects are now working together it's a bit of self-contained code and data and it is one way to take a very complex problem and make it easier by breaking it into separate things that can be engineered and developed separately so you've been using string objects or maybe you'd use beautiful soup or something these are powerful capabilities and if you had to look at all of them it's just hey here's the thing use this object it'll do these things for you and there's lots of details inside of it just don't look at it don't worry about it and so there's boundaries that things that you can use things that you can look at and things that really you don't bother looking at you go read the documentation and use it and away it goes but then someone had to write that and so they built an object so we're going to do is look a little bit under the covers of what it takes to build some of these objects and so if we think of this program that originally just sort of did processing we can think of it as having some kind of an input right coming into our program and we have a string object a dictionary object maybe eventually some objects like a database object or an object that we eventually define and you can think of us we're receiving data it comes in an object which is a string object we start putting the strings in dictionaries and do whatever we pull out a list of them and and so you can think of data as moving between these objects and like i say even strings in the first week first lecture first week first everything we um we were using objects and we've been using them all along and so you can think of every string and every dictionary as a little program all by itself that has a bit of code and a bit of data and so a string has the data which includes all the characters that make up the string but then there is a method called uh upper that'll does uppercase or r strip that strips off the right a white space from the right and so it's it's like they're almost little programs that have inputs and outputs themselves and we can make lots of them and there's lots of cooperating objects that make up an application and one of the nice things about the object-oriented pattern is that they form boundaries and within the boundary if you're inside the object you can say look i'm going to build you a string object or a database object or a beautiful soup object and i'm going to build this capability and i'm going to give it to you in the form of an interface and i'm not really going to care how you use it and so we have this sort of visibility wall where i'm going to make an object and i'm going to let you use it and the maker of the object doesn't necessarily have to know every single thing about the use of that object but so just like inside the object they don't have to worry about what you're doing with the object outside of it when you're outside the object you don't have to worry about what's going on inside of it we as the user of the object we talk to its interface and we get things from it and give things to it and use functionality within that object but we don't have to look inside of this we can just say oh it's a nice little magical thing we read the documentation we were at a web page and it told us to do this this and this and away you go and so it is a it is sort of this isolation boundary that works both for the programmer who's writing the object and the programmer who's using the object and so it's a it's a very nice pattern um and so you'll see how we're going to build code and we're going to group it together and then we're going to be using it sort of as a big blob of stuff so some definitions in this space words that i want you to understand when we're going to create one of these things one of these objects instances that has some data okay so now that we've gotten through the definitions let's work into some sample code but hey look at this we've got ourselves a cookie cutter and some cookies so remember that a class is a template it's not the actual thing an object is an instance of a class so you have to take the class and do something to make the object and actually you can see here some other classes there's clearly a sort of a snowflake class and a gingerbread man class that's an object object object somewhere out here there is a snowflake class and a gingerbread class but we got a snowman object and a snowman object and a snowman class so class is the template object is the instance so here's a bit of python code so let's take a look at what we got here class is a new reserved word kind of like def we have the name of the class that is a name that we choose we're gonna that's the name by which we'll refer to this class for the rest of this program and it has a colon at the end of it and which means it starts an indented block which ends when we de-indent inside the class there are generally two things there is some data and this just looks like an assignment statement now i'm going to talk a little bit about object life cycle and what we mean by object life cycle is the act of creating and destroying these objects and i've been using this term constructor already and so when we declare a variable whether it's a string or a dictionary or a party animal where we create them and then they're discarded and there's all this dynamic memory that comes and goes and we as the writers of objects have the ability to insert ourselves at the moment of object creation and at the moment of object destruction and we make special functions that we call the constructor the object constructor or the class constructor and the destructor and we don't actually explicitly call them they're called automatically by the uh by python on our behalf and so the constructor is much more commonly used it's used to set up any initial values of variables if necessary etc destructors are we'll cover them but they're they're used very rarely so here's a bit of code that we've got it's our party animal and a lot of it is the same as what we've been doing so far so we have this variable x and the constructor has a special name underscore underscore init underscore again we pass in the instance of the object self and in this one all we're going to do is print out that you're constructed and here's this code that we've had before and now we have underscore underscore dell and then we pass in self and we'll just print out uh that we're being destructed and what the current value of x's for that particular instance so let's go ahead and run this and so again this doesn't really do any code up to here that just defines party animal but this is the constructing of it and basically that says oh and it really kind of creates these variables and then it also runs the constructor and so in this case this line right here is causing the i am constructed message to come out then we do and party a and party and that says you know one and two and here's an interesting thing we're actually going to destroy this variable by throwing away and no longer points at that object and's going to point to 42. so we're going to sort of overwrite a n and put 42 in it and at that point python's like oh this whole little object that i just created somewhere it's out here it's vaporizing it and throwing it away and so before this line completes it actually calls our destructor on our behalf and so that message comes out so we are allowed as the builder of these objects to add these little chunks of code that says i want to be involved at the moment this object is created and i want to be involved at the moment that this object is destroyed now in this last line a n is no longer a party animal a n is now an integer it's got a 42 in it it's gone it's been created it was used and then it was destroyed okay so you got to be careful if you overwrite something you kind of sort of throw the object away so the constructor is a special block of code that's called when the object is created to set the object up so we can create lots of instances everything we've done so far is we make a class and then we create one instance one object and each of these objects ends up being stored in its own variable we have a variable a n and we've been using it but the more interesting thing begins to happen when we have multiple instances of the same class sitting in different variables and it has its own copy of the instance variables so let's take a look at this so this code here i've taken out the just i've taken out the destructor and it shows a little bit more information so now we're going to put two variables in here we're going to have a current score or whatever and a name and we're going to start it out as blank and this time we're going to add a parameter onto the constructor and so the self comes in sort of automatically as the object is being constructed but if we put a parameter on the constructor call which is this party animal call then this comes in as the z variable and so self is the object itself and z this first parameter is whatever parameter we put here everything we've done so far has no parameter here but now we have a parameter here and then that means that when we call this constructor this line of code comes and then name is no longer blank name is going to be sally in this particular thing and then i'll say oh self.name which will be sally was been constructed and so then then we have this and that object is now constructed and we put it in the variable s and then we call the party method on that and we construct a different one and so this time it calls and z is jim and we basically have a oops another copy of this and so this is how it's going to look right as as it runs down here as it runs down here when this is called it makes one instance and stores that in the variable s and there's a variable x in there there's a name in there there's an init method in party and that's all in here right all that stuff is in here and now we say let's make and that's going to have a sally in there find sally in there and then we're going to do another constructor and so it's going to make a whole new thing and it's going to store that in j and this one's going to have jim in it on s party then this turns into a one and then we gonna call j party um that turns that into a one and then s party will cause this to be a 2. okay and so what happens is we have now two objects one in the variable s and one in the variable j and they have separate copies of their instance variables these are the instance variables or the object fields or whatever but they're the variables but the key is is that every time we do a new construction it duplicates this and there's another copy of it so there's an x within s so s dot x is this variable and j dot x is that variable okay so the next thing we'll talk about is inheritance and that's the idea of taking one class and extending it to make something new so the last topic we'll talk talk about here in object orientation is the notion of inheritance and this is a form of code reuse and it's one of the more advanced uh aspects of object-oriented programming so just kind of understand what it is at a high level and then you know where to come back to when you need to learn a bit more about inheritance so the idea is instead of making a new class from scratch we actually make a new class by starting with an existing class we are extending it or another word for this is subclassing and it's sort of a a situation where you're like i've got this code and i've got this data and i just need to add a few things to it and then i'll have a whole new thing and as you design objects and what we call object hierarchies you often do this and it's a form of sort of real clever code reuse but again don't necessarily think that you're supposed to know when to use this or why to use this is right now it's just terminology okay just terminology we have what call these as parent child relationships the original class is called a parent and the new class is called the child class so subclasses are another word for this you have a class and then you subclass it i think extending inheriting and parent child are probably better ways of expressing it than subclassing so here's a bit of code let's take a look at this um this is this code's unchanged it's the party animal code that we've been saying all along um it's the one that you we we construct and put a name in and now what we're going to do is extend it and so you'll notice that this code down here is the part that's doing the extending so we're making a new class football fan and by putting in parentheses before the colon party animal that says football fan inherits everything that is party animal meaning the x the name the init the party all those methods and data are sitting there and now we're going to add a new variable so football fan has in addition to all those other variables has points and it has a touchdown method and you know point uh self points is added you know to we add seven of the points and then we call the party and when that does that so this is calling this method because football fan includes x name and party and init and everything and all this stuff this constructor so so this football fan is really an amalgamation of all these things together party animal is just this stuff right but so we still have two classes we don't just have one we didn't erase the party animal class and so we take a look at the code that we can run here we can say oh okay let's make a party animal sally and so that constructs a an object like this and then stores that in s and with an x starting out zero and and then we call s party oops better change that color um starts out at zero and then we call the party method and that changes it to one okay and so this is this bit of code it's as if this part doesn't matter at all because it is a party animal it's not a football fan but now if we take a look at this code down here take this code down here we're going to construct a football fan and pass in jim but football fan has no underscore underscore and net so that actually uses the underscore knit from party animal because we extended party animal to make football fans so we inherited all of the good that was in there so there it's going to make a name a variable x which is going to start at 0 a variable name that's going to have gem in it and a variable points it's going to have a 0 in it so this j variable has more things in it than the s variable has and so we can call the j party and if we call j party that goes here and adds one to x right so that adds one to x and then we call j touchdown well that comes down in here and adds seven to the points right and then calls party within us and so so self.party is the current object i.e self and j are the same thing right self.party and then it goes up here and passes self in and it adds one to the x in this case of this j variable so this becomes two and that's where it prints out it prints out you know seven and two and away you go and so it's a way for you to kind of take all this stuff and stuff it into an a class by making a new class and just add the extending bits the bits that are in addition to the other stuff so like i said inheritance is a powerful and wonderful concept it's a form of uh excellent form of reuse but uh basically the whole purpose of this lecture was so that i could in the future just use these words and you would understand them as compared to i just want to say method and i've been saying method all along this high time that i defined it so let's just review one last time class is a template it is not actually a thing it is a shape of a thing and we define it and say when we make one of these things it's going to have these variables and it's going to have these methods in it attributes variables within a class a method is a function that's inside of a class object is once we construct a class we get back an object and so object here is the snowmen cookies class is the snowman cookie cutter and a constructor is a bit of code that sets up our object our instance when it first is created an inheritance is this ability to create a new class but take all and import and affect all the capabilities of an existing class so object oriented is awesome for the rest of this class we're not going to write any object code we're not going to use class at all but we're going to use objects and literally you've been using objects from the beginning of this course as soon as you said print whoops as you as soon as you said you know x equals hi that's an object and as soon as you said x dot upper you were calling a method right you've been calling a method all along when you're doing something like f h equals open this thing you're getting back that's an object and then you do fh.read or whatever you're calling a method in the dot operator so you've been using objects all along i now i'm just finally explaining to you when i say call the read method or call the upper method or what's this little dot and why is that there so again it's time for us to understand that but you will it will take you a long time before you encounter a problem that's large enough where as part of your solution you're going to make a new object but when you do it's really a powerful thing i mean it's a really bad idea for me as a teacher say oh write a bunch of objects it's like it's it's premature for that it's later is when um you will actually learn how to use objects and you'll be like oh thank heaven that these objects are here okay so uh that's all for now uh thanks for listening see on the net [Music] now we're going to represent these relationships in the database and again what we're trying to solve here is this notion of database normalization third normal form there is so much theory right but in in this lecture i'm just going to condense this down to don't replicate string data and use what are called keys use integer keys to point at those things and we're going to use these integers then to point so assign each row an integer and then we're going to point from one row to another using those integers and so we're going to add these special key columns to each of the tables and help the in the database will even give us help uh managing those now and so we still need to keep track of you know who is the creator of the album which album a track belongs to we've got to create these relationships and we have to come up with ways to store those relationship and so the idea is is we're going to have a column in one in a table which is the key column and we're going to call this the id column and so this is a row it might have many bits of data here but in this case it's just the name of an artist so this album is going to belong to an artist and we're going to assign a number inside the database and so that led zeppelin is 1 and acdc is 2. and so we have this key this is called a primary key and then later when we want to say that the who made who album really was uh done by ac dc um we put the number two in and so the difference here is instead of saying ac dc in this record we just put the number two once we've established this number so we assign keys and then we have these pointers that point back and so that's how we model a relationship with with these small integer numbers and so there are three basic kind of keys that we use one is the primary key and that is that little id column that is just a number but once we give led zeppelin the number one led zeppelin is the number has got the key one for the rest of that database the logical key is the text area that we use that you might look up so the title of the band or the title of the album that's the logical key and then the foreign key is one of these keys that is really pointing to the primary key of another row so that's called a foreign key and it's in you might think that you want to use something like an email address as the primary key for a user table or something like that the logical key should always be separate and there should always be a primary key that integer number because things like logical keys do change people do get new email addresses and if you've got that email address as a foreign key pointing all over the place it doesn't work out so well and so that's why you use these small integer numbers that have no meaning outside so sometimes if you're on a system and you see a url and you see some number like 420 2016 you're like oh that turns out to probably be my primary key in their database so sometimes you can look in a url and you can see these primary keys in the url but they don't mean anything outside of that particular system so like i said a foreign key is a key that is really pointing at a row in a different table and so so we have the album has a primary key for it but the artist underscore id points to a row in the artist table as we will soon see i have a naming convention and in my naming convention on this lecture i use id for the primary key and then artist underscore id i use uppercase for the table names and then artist underscore id says this is a key this is just a key that points to the id key of the artist table and so that's what i do so you'll see in all my stuff i'll use that it's a convention it's not something sql forces you to do but you will find when you go to organizations and work on their databases these conventions are very important so i can do something and you can understand the rules in which i created it some of these you'll find this used by some people you'll find completely different conventions and that'll be okay whatever convention your organization uses learn that convention so now we're going to talk about how we put these keys in and then how we actually make the connections uh from one row to another row so now that we know what a primary key logical key and foreign key are we're going to actually start putting these together and creating tables that have these kind of values in them so when we were done we drew this picture that was sort of a logical model of how our data would be spread across four tables and how those tables are connected now we have to take this and we have to map it in a way that leads to the column row the columns and the needed columns in each of our database tables and so here's what we do we basically have to take and for each of these when we're going to build a track table we're going to build a track table we add a primary key so we just added an id field to every one of these things and that's so we have a place to store the sequence number of this particular row we have logical keys we've just marked those those are strings and then we things like you know rating length and count they just kind of go in here and now we have to model a relationship so we do is we in the table that the relationship starts from we put one more column in and this is the one i will name album id and that just is an integer column that's going to record the album id so there might be this might be 16 and then 16 goes in here so there's one of these columns that's a foreign key that points to this and that's why it's foreign this is a key that's not in the track table this is a key in the album table that we're pointing to and so there's a foreign key and that's what we have to do and we just do that over and over and over again and we can quickly convert that picture that was a logical picture to having every table has a primary key and every time we have a starting point we have a foreign key foreign key and then foreign key and then we mark these things as logical key logical key logical key and we'll see how we do that and so that's the picture now we have a picture of exactly how we're going to lay these tables out in the fields that we need in these tables so we're going to do a create table statement and i've got this create table statement sitting there and so this one's going to be a little bit different we're going to say create table artist and the id field is integer and it's we're going to add all of this stuff this is adding to the column to tell it additional stuff it's a primary key which means we're going to use it to look up a lot it's automatically incremented which means the database is actually going to provide this number for us as we insert records we not it's not allowed to be null it's not allowed to be empty and it's supposed to be unique and then it's going to have the artist is going to have um a name column a name column that's just text so let's do that we already have our users and this is now we're going to do a create table in this sql and you can do that that's okay that's totally fine and we have to get this right and we say away we go and so now if i take a look at database structure i've got a users table as well as that that users table we're playing with before and uh this artist table let me go ahead and delete this users table just to say goodbye okay so now we have the artist table and we take a look and it's got an id and it knows all about this stuff okay so that created the table we're going to keep doing this the next thing that we're going to show here is we're going to show the foreign key right so artist id is just an integer in some database languages like mysql and oracle you would put more stuff here to say this is a foreign key blah blah but in sqlite we keep it simple and just say that is an integer column that's a foreign key the album table has a primary key and a foreign key and then the title so we'll go back and we'll grab that text out of my little page just create table go back to execute sql and then run that and we'll continue we'll just the genre table has an id on it and um primary key audit you'll just copy and paste these uh that whole thing you do that over and over and over again so we'll go in here and run that one and so the last one we're going to do is the track table and the only thing that's kind of weird about the track table is it's got two foreign keys right it's got an album id and genre id once you draw the picture you just sort of literally translate these things it's got two foreign keys and a primary key that's pretty much just like all those other primary keys and you know integer counts an integer and length is an integer all that stuff and now we we've got it so if we take a look at our database structure we're going to see that our album genre and track are all set up and these are no columns that we just made with those create statements okay so now let's insert some data this first insert statement is kind of important to take a look at so insert into by the way the keywords can be upper a lower case table name columns now this table has two columns it has id and name but we told the database that id was auto increment so it's going to actually give us the number we're going to it's going to assign the number rather than make us assign we could make it be one two three but we say hey database you're good at this why don't you make it one two three and so there is going to be a record that it adds lead zeppelin so let's take a look at that so we'll insert led zeppelin oops over to sql insert led zeppelin and run it so now if i look at database structure and i look at the let's look at browse data and look at the artist database you will see that i put led zeppelin in but this id field here was auto incremented and so it it was put there by the database and now when we do the next insert which is ac dc and we take a look at the data we'll see that ac dc is 2. now if you're writing this in a program if you're going to write this in a program you can get these numbers back from the database in your program but i'm not writing this in a program so i have to remember that one is zeppelin and two is acdc so i'm gonna keep myself a little cheat sheet here to remember that um because everywhere else in the program that we're gonna say led zeppelin i gotta say one now because the artist the uh the artist id of one means led zeppelin in those rows and so now we're going to go back and we're going to take a look at the next one um and now we're going to put the genre in if you think about it we're working from the leaves out where the track will be the last table that will update because you have to define the keys for things like rock and metal and led zeppelin and all those other things and again even though the genre table has two columns id and name we're only going to specify the name and let the database assign the value so i'm going to insert both of these and use a semi the semicolon trick put a semicolon here and a semicolon there and run that and so if i take a look at my brows data and i look at the genre it's assigned one to rock and two to metal i'm going to write that down one rock to metal i should have done something like rock and country because i can't even tell the difference between rock and metal but whatever that's my musical skill is not what's at issue in this class so now we're going to put an album in the album is the first thing that has a foreign key so if you remember the thing the album points to artist and so that means it has a foreign key of artist id and so we have to explicitly say this because word the system doesn't know which artist who made who is but we know that who made who is acdc and that's two and so we know to put artist id in so we'll say insert into album title artist id and so we have to know what this two number is and of course because we have our happy handy diddle handy little cheat sheet we can go over to execute and run that and i'll put a semicolon there and a semicolon there and run it and so now we have in the um in the album field we now have this and so these this was assigned and so who made who still have to write down that um who made who is album one and album two is led zeppelin iv that makes it even more complex because the name of the album is a roman numeral four i'm sure i can figure that out okay so the next thing that we're going to do is we're going to insert the track record now if you think about the track record the track has two foreign keys and um and it's got a lot of stuff it's got the title it's got the rating length count but then we got the two foreign keys and so we have to know these numbers so this this 2 1 this 2 1 this 1 2 is the genre we're specifying the genre and the album that this track is from by those numbers now again we have to use this cheat sheet but if this was a program the program would know that one was zeppelin and um you know our one was who made who and two was led zeppelin iv and so that the programs this this kind of stuff is easier for the program to understand than for us to keep track of and understand but just just so we can get through these few records and that's why i rely so heavily on my cheat sheet so here we are all with all these numbers the the foreign keys are the tricky part here everything else is really quite straightforward so now i'm going to insert four records into my track table [Applause] and then run that okay so if i browse data and i look at my track table this column here this id that's the primary key the track table and then here are the two foreign keys now now the interesting thing is now there is replication in these columns but the numbers are what's being replicated and that's okay we went a long time just not to put led zeppelin four in twice we could have made this a string but by making this an integer it saves tons of storage and makes it super fast that turns out to be one of the key things that makes databases super fast is using these integers so we take a look at all this stuff we see that in a sense by using these little numbers we are pointing to rows in other tables the foreign keys are always pointing they always point to their id so these foreign keys are out here this is the primary key up here and they always point to so now that we've carefully constructed our relationships in the tables we need to reconstruct the data to show our users and you can kind of see how you would go pull this stuff together but there's a wonderful capability in relational databases called join that brings this all back together and so we have done this for efficiency of storage efficiency of scanning etc but we do need to traverse these foreign keys at times and the database software will do this for us automatically so the join operation basically is a way to specify in a select statement that you want to pull data out of more than one table and then specifying using what's called the on clause exactly how you want that data pulled out and so here we go we already have a a table an album table to the artist table and the foreign key and we want to in effect pull data from both the album and the artist the album title and the artist name and we want to show that and so we're going to say select which is the same select statement here's a little different syntax this is the list of fields this is table dot field so it's the album title and the artist dot name comma there from the album and i always start with where the little arrow starts from album joined with so that is going to can walk down this connection from album to artist album joined with artist don't say with i just say it on and then this is the conditions upon which that join is going to happen when the album's artist id which is this column here alums albums artist id matches think of that as is equal to or matches the artist's id and so it only connects the rows here when there is a match between these two tables and so if we look at this and we see that you know this one matches this one and this one matches that one and so it's the join connects conditionally and it can connects when the on clause is satisfied and so when this whole join runs this is what we get so you select all this stuff now this is an abstraction are you writing a loop are you doing two nested loops how are you exactly bringing all this data together we don't care about that because that's the beauty of sql that's the beauty of how we do this in a database so now if we we can just run this command so let's grab this command select track title genre name from track join genre that exact query case case of keywords doesn't matter and we go over here and we run this as sql and we run it we get oops i got when i got too far let's do this one so let's do that one there select artist name i have to add that one to my little cheat sheet the next time you see the cheat sheet it'll be right so the title so this is coming from one table and that's coming from another table okay so that's one so here is something we can do that gives us a little more detail on that we can say so so this is where the connection so you can think of the join as sort of spreading one table and connecting it to the other table and so what we're going to show here is it's exactly the same the thing we're going to do is we're going to add these two columns so you can see where the match happens and so this this is one table this is another table and these are the these are the the kind of columns in common even though they're not they're the columns that match this is where the on clause is happening right we've we have taken this table joined with this table on these two things connecting with each other so you can almost in some language some variants of sql this would even be a where clause so you connect these two rows but only connect them when those two numbers match so so you can see i mean if we run this i'll just run this and again you just see these this is where it connects okay now interestingly we can see what happens and what the purpose of the on clause is if we omit it so this is exactly the same as that previous query except there's no on clause so it's select all four of those fields from the track joined with the genre so it's basically taken the track table and the genre with a join but no on clause so it's not filtering for matches this is a match this is a match that's a match that's a match but we don't have a non clause so the matchness doesn't matter and so you're going to get all possible combinations and literally if there were you know 10 on one side and 30 on the other side you would get 300 rows in that join so it'd be all combinations except the on clause reduces the combinations and you might think whoa this is really inefficient and i will say that's what my first reaction was when i first saw this but it's not inefficient that's the beauty of abstraction that's the beauty of sql you say do it and and it just figures that out so um let me grab this and you will see that we can run this one as well and that kind of gives you why the on clause is important because now we have a whole bunch of these things and the on clause just filters that out so if we would just add the on clause back in then that would only show the ones we showed on the previous slide so that's why the on clause is important the join is like all possible combinations of all pairs of rows between these two tables on is oh but only where these two things match now you might think that it's inefficient but the on clause turns out to be the way it becomes efficient okay so now we're going to do the same thing where we're just going to take the track title in the genre we're going to connect that together together so we select this we're going to we need to join from one table join to the genre table with an on clause and so we're going to make those connections and the only thing we're going to look at is the title and the genre name oh oops and then run that and so we got those title and genre name now the thing you'll notice is for the first time we now have replication of string data in the vertical dimension that's okay because the data is not replicated in the database the data is now replicated as a result of the join and so we are going to reconstruct what the user wants to see which the user originally all the way back to the beginning wanted to see the duplicate information in the vertical axis but now we're reconstructing it we didn't waste the space or performance in our database but we still have to show them and so now the next thing we're going to do is a monster we're going to reconstruct across all four tables and you might think this is really hard and and sure it's going to be a little tricky but as long as you file the naming convention and the naming convention makes sense we're going to do a select from the track's title the artist name the album's title and genre name from the track join genre join the album joined artists so the joints follow the little arrows right and then the on clause qualifies each of those arrows when to follow the arrow and then this becomes pretty easy it's a foreign key the tracks genre id that's a foreign key equals genre dot id the primary that's primary key that's a foreign key because i name it that way and i know that this goes to that genre table because i name it that way and tracks album id is equal to the album's id foreign key primary key and albums artist id is equal to art assigned e after a while you could type these pretty fast as long as you follow a name and convention and and you know the naming convention so this looks like it's really hard to do but after a while it's really just a pattern so let's go ahead and run that one and it will assuming we've done everything right replicate all the data so there's all kinds of vertical data now being replicated every column has vertical data again it's not in the database the select and the join are reconstructing vertical data as it needs to be shown to the user and so if you've been following along probably a couple hours later now we started with a picture that was our mock-up of what we wanted our user interface to look like and it had vertical stuff and we're like ah we can't put that in a database model and then we carefully build a database model that didn't have the data and then we're like ah we got to reconstruct it so we use join to reconstruct it and so after all that we went here with a clean little model with four tables all beautifully connected together and then we had to join it all back together so join reconstructs it and again the key is the storage is efficient their scanning is efficient and we still use the join to produce the output that we ultimately want with all the sort of vertical representation the vertical replication that our users really want to see so so one more kind of relationship this is that was called a one-to-many relationship that was actually three one-to-many relationships and the other major relationship is what's called a many-to-many relationship [Music] [Music] so our last major topic is called many to many relationships and up to now everything that we've done is what's called a one-to-many relationship and that is there are many tracks associated with one album there are many albums associated with one artist there are many tracks associated with one genre and you can think of labeling and as you look at data models they put little labels on each arrow that tell you which end of the arrow is the many and which end of the arrow is the one and so in this case the foreign key is pointing to there are many of these rows over here many rows that point to one row over here so it's a many to one relationship there are various ways sometimes sometimes i'll put two arrows at this end and one arrow at that end but whatever it is this kind of thing we've been showing is a many-to-one relationship and that's probably the most common thing but there are times when you just can't model things with a one-to-many relationship so like if you have a mother and children well that's uh that's a many to one relationship and it's just fine and that works fine but sometimes you have a many-to-many relationship in that there might be many books one book has many authors and uh each author has many books and so you don't have like the one side there's no one and so you have to end up building a table that what we call i call it a connector table they call it a junction table on wikipedia but we need a little table that allows us to break a many-to-many relationship into an effect too many to one relationships and a connector table and so this is a connector table so you could think of this as you know there are many many links here but we don't have a way to model the many over here to here and so what you do is you basically say oh there's a lot of these things there's many that go to the one the many that go to the one and then in here you sort of create that many-ness that you want to create so it's probably just as easy to look at a sample of this so let's uh imagine a learning management system uh where you're taking a class and there are some people that are teachers and some people that are students and many students are members of many classes a student can be part of many classes and a class has many students in it so you can't really find the one end and so what we do is we make a table called membership and in that table of membership we actually often don't put a primary key in it all we simply put in two foreign keys and if we're going to put a uniqueness constraint we put a uni a combination of the two uh foreign keys as the uniqueness constraints so we say there could be duplicate user ids and duplicate course ids but there can only be you know user id course id combinations that has to be unique so you can make unique be more than one um one column and so if you imagine a course table and a user table there's a user id the name and email the course has a title and an id and then we have this little table that just is the connector table that shows the points out and so we can expand this membership so let's take a look at how that works so we're going to create some tables and the these are very classic tables because these are the the one end of it so these are the one end of it so it has a primary key a title a logical key email there's a primary key for course and then there's text so we had this unique to kind of indicate that it's a logical key we're not going to allow ourselves to put any duplicates in here now that the connector database here is a table member and it has two foreign keys user id and artist course id and you can easily model some data here so i'm going to model role which is going to be zero equals student and one equals instructor and then i'm going to indicate that the primary key or uniqueness constraint is the combination of the user id and a course id now when we say the primary key it it both limits our ability to insert duplicates but it also allows the database to optimize its scanning because it knows that that combination is always unique and so it can organize its disk structure and storage structure to understand how to look things up more efficiently knowing that once it's found a user id course id combination it doesn't have to look any farther because they're unique and so all of these contracts that we add speed things up save storage and makes things more efficient but in ways we don't always know exactly how they happened and so let's go ahead and make these let's go ahead and make these guys i think i will start with a new database i'm going to call it lms for learning management system uh no i don't really wanna do that one and so i'm going to not create the table i'm gonna do everything in sql and so let me see if it's in my cheat sheet nope that's not in my cheat sheet so i have to fix the cheat sheet again for you by the time you see a cheat sheet all these things will be in there so i'm going to go in here and i'm going to grab create table user actually i'm going to grab them all watch this grab them all highlight all these go over to sqlite browser blast them all in and then i'll put a semicolon at the end of each one of the statements and i want to run them so did i look does it look good yup yup yup so i got a course i got membership two foreign keys and i got user so that all looks good okay so now we're gonna have to insert some data in and we're gonna insert from the outside in and so we're going to just put the name and email the id will be automatically assigned for the users and we're going to do the same thing the the id and the courses will be automatically assigned so let me just grab all this stuff go into sql that has the semicolons at the end already for me thank you very much now i'm going to run it and if i take a look at my data now i've got primary keys for the courses and i've got primary keys for the users and i've got nothing in the membership table and i have to of course have to remember what these values are because jane is one and that is two and sue is three right and python is one sql is two is three and so when i go into membership i've got two foreign keys here in a row and they just have to be for the course person combination and so it's a little tricky to figure all this stuff out but again these are just numbers and if you look at these numbers user id course id role well user id one is in course one user id is in course as the teacher user id two is in course one as the student etc etc etc so i'm making these connections by just putting these little numbers in and once again conveniently i have all my semicolons perfectly in place so i go to sql and then i run that and then i take and i look at my membership data and there it is so two foreign keys and a bit of data modeled at the connection that's the way we say that the role is modeled at the connection so now we build all this stuff up we can write some queries that take a look at this and so what we're going to do is we're going to look at who's in what course and what role are they and we're going to sort this in a nice way so let's just take a quick look at the code we're writing we're going to do a select from three tables the username the member role the course title so we're in effect we're not showing any of the foreign keys or the primary keys we're going to go from the user table join to the member table join to the course table this is pretty easy to write you know there are three tables you want to go across the on clause is also very easy to write right the on clause models each of these connections where the members user id is equal to the user's id and where the member's course id is equal to the course id so we're going to connect can we're going to concatenate all three of these tables together but we're going to only keep rows that where it matters now that this is not this role doesn't participate but we're going to print that out and we're going to order it by the course title first and then the member role second and the name third and so let's run that so we've reconnected it so ed's the teacher of the php class who is the student in the php class jane is the teacher in the python class ed's a student and sewer students in the python class ed's the teacher in the sql class and jane is the student in the sql class and so we have many people are the in there are many students in many classes there and so we have modeled that but we model that with this sort of table and if you look at a piece of software that i've written called sugi which is a standalone learning management system that's built with learning tools you will see in anything we're membership where we have a we have a user table we have a context which is also the course table and then we have a membership table and you look here's these foreign keys it's kind of like that's the many side that's the one side many to one and so this you know this is now a an effect of many to many court between these two but then it's modeled as a series of many to one many to one relationships and you see this all the time in all kinds of things where membership or other kinds of things are necessary many to one of many too many so with all that there's so much to learn it's it's both easy and complex at the same time it's easy when someone shows you how to do it but at some point you will learn how to build database models and you realize oh it wasn't so bad uh it takes a while to get used to them um this really just is a quick walk but the the bottom line is the what we just did seems like it was wow that's nice do you really have to do that and the answer is if you're going to scale it all you absolutely have to because you simply can't read and write data sequentially you can't read through update one little piece of data in a file by reading all the way through and then writing a new copy of the file that could take seconds and in a system like an online system you get a 100th of a second to do something like that and the databases make it so that happens in a thousandth of a second so you ultimately you simply have to take advantage of this you just can't if you're going to modify data you can read data from flat files but even if you're going to read a lot of data if it's big it slows down terribly so it might seem like there's a trade-off that you could debate whether this is worth it but if you're going to deal with a lot of data it's you've got no choice it's really not as much a trade-off as you think so this has been a quick romp through databases we talked a little bit about indexes there are constraints we talked a little bit the not null stuff we've talked about that the uniqueness that's a constraint another whole area is what's called transactions and that's the locking of little areas so you can read an area then lock it and then update it to make sure no one else reads it uh and and so they make sure they either get the the version before uh you looked at it or before you change it or after you change it and so that's how you make sure that you can't do things having to do with um bank account uh balances and get yourself in trouble so these are a lot of sql it's really fascinating sql is a fascinating thing to use and learn and performance tune and enjoy so relational databases are cool this gets us started the the big thing is don't allow replication vertically of string data pull that out into a separate table establish a primary key and then have foreign keys that point to that primary key it is not just how much data you store it's sort of a compression way as a way of compressing data you might think strings take no data but they do numbers take a lot less data and it's both how much data that's stored but also how much data has to be scanned and that way joins work that's part of the magic of why oracle's such as such a successful company it's a bit of art form and it's something that you can work your whole life and always get better at [Music] hello everybody welcome to python for everybody we are doing some code walkthroughs if you want to follow along with the code you can download the source code uh from python for everybody dot uh the python for everybody website okay so the code we're playing with today is tw friends.py and this is a step beyond the simple tw spyder it is a restartable spider but we're going to data model things a little bit differently we're going to have two tables and we're going to have a a many-to-many relationship except that it's sort of a many-to-many relationship between the same table which is okay friends is a twitter friends are a directional relationship and so uh so we start out here in twfriends.py remember that the file hidden.py i'll show it to you but i'm not going to open it because i've got my keys and secrets in it so this hidden.py file you've got to edit that and you got to go to apps.twitter.com and get your keys and put them in there otherwise these things won't work but if you have twitter and you set your api keys up and you put them in hidden.py then all these things will work it's kind of fun actually and impressive not hard to do actually so the twitter url that's my library that reads hidden.py and augments the url and does all the oauth stuff json and ssl because twitter doesn't i mean because python doesn't accept any certificates even if they're good certificates so we kind of crush that here's our friends list that we're going to hit we're going to make a database friends.sqlite now here we're doing create table if not exists so what this really is saying is i want this to be a restartable process and i don't want to lose the data we're starting out uh we do not have uh sqlite any sqlite files and so this is going to create the database and create these tables but the second time we run it we're not going to recreate the tables we're not going to we're going to be able to restart this because we're going to run out of um we're going to run out of uh rate limit before uh we finish this but so we just have to wait however long the rate the temp takes to reset and we'll watch the rate limit go down and so we're going to have a people table and we're going to have an id a primary key and the name the name is going to be unique and whether or not we've retrieved it and that's kind of from a previous one but then there's the who follows who um the from id to two id and so this is a direction and we're going to put a uniqueness constraint in just like we do in many to many's that basically says the combination of from id and two id has got to be unique we don't allow ourselves to put duplicates of the combination so from id can be one in many records and two id can be one in many records but one one is only allowed once and this is the crud we have to do to convince python to accept the twitter uh certificate and so this is similar to some of the other stuff that we've done we're going to uh enter a twitter account or quit and if we hit enter by itself then we will actually go and retrieve a record that was not yet retrieved and now we're actually pulling out two values id and name and so we will we will grab fetch one is going to give us a two tuple basically and we're going to store that in id and account of course that's like this is this is coming back with a two tuple first of which is the id from the database limit one means we're only going to get one of these or zero of these if there are zero these that means there are no unretrieved twitter accounts retrieved equals zero well you'll see in a second that the all this new accounts we put in are the ones for which we haven't retrieved and again given that our rate limit we want to know which ones we've retrieved okay and um and so what we're going to do next is we're going to check to see if the person that we just checked which means the length of the account is great that we just were entered we're gonna check to see if they're already there okay and we're gonna select id from people where name equals so that's the one we just entered and we're going to fetch one and grab the first thing because we only we only got one thing in the select statement here and if this person that we just asked to see is not in the table that means this is going to fail we're going to do an insert or ignore this or ignore is kind of redundant because we just checked to see if it was there but we'll put that in just to be safe um and we're going to put the name in for as the new the new account that we're looking at uh and we're for indicating that retrieved is zero so that we will we will know that we haven't retrieved it yet you'll see that we'll update that in a second we commit it so that later selects will see this so that so you gotta do the commit the slater select wouldn't see the one we just inserted and we're gonna ask how many rows were affected and if it's not equal to one uh then we're going to complain about we inserted it and we are going to do this thing we're going to ask hey remember there was an id up there right here id integer primary key and we did not insert this here but we want to know what that id is and every time i was showing you that in lectures i was saying it's really easy in python to do this and that's we're saying this cursor did the insert but one of the things happens is after the insert we're going to grab the last row id which is the primary key that was assigned by sql okay and so that means that one way or another coming through this code here in line 45 one way or another we're either going to know the id of the user that was there before or we just inserted one and so we're going to know the primary key of the current user and you'll see why we need that so id is the primary key of the current user that we entered right here okay and now we're going to do is do the twitter url augment with the oauth and all the keys and the secrets and hidden.py instead we're going to go through let's count 1000. let's go count what the heck let's go 200 up to 200 friends save now let's do 100. we'll keep it that way and then we're going to retrieve it and uh we're retrieving the account we're not going to print the nasty url out we could then we're going to open the url with the connection and then we're going to read that and we're going to get the utf-8 data from this and then we're going to decode that and we're going to have the unicode data so the data in string is a internal python string with all that data representing all the wonderful characters and of course we're going to ask url open to give us back the headers as a dictionary using this call and we can see what the how many we have left for the remaining right what's the remaining rate limit that we have okay and so then we're going to parse the data with json load s if uh oh wait i need to continue in here continue okay save um if we're going to parse this data we'll print it out right so that means that this this died which means it's not syntactically correct json basically and who knows if we're ever going to see that but at least when it blows up it'll print this data out we'll have to catch it and then it'll continue actually i'll make this a break because if that's blowing up that bad we should quit now we don't i don't yet know what happens when this rate limit says you can't have it and so but i do know that i expect when it's successful that there will be a key of users in this outer dictionary that we're going to get and if this outer dictionary that we're go if we if users is not in the parse dictionary then i'm going to dump out this data so that at least i can debug what happens when i've got some broken json so the difference between this code this code is going to fail when the json is syntactically bad meaning a curly brace isn't right or whatever this code will trigger when i get good json but i don't have a user's key in it okay so then once we've retrieved it we we're pretty happy with it we're going to update for our account that we are retrieving we're going to set this is one of our retrieved accounts okay and then what we're going to do is write a loop that goes through all the friends of this particular user that we're asking and gets their screen name prints it out and then we're going to check to see if this one is already in our people database because this is a spider we're grabbing accounts and uh and so we'll do a friend id and do a fetch one grab the subzero thing and if that works if this person's not in there this fetch one is gonna blow up which means we're gonna drop down to the accept code but if it does work we have friend id is the you know that we they they're there and they're already in our database right they just weren't retrieved okay and so now if we the friend id wasn't there we're going to do an insert into setting retrieve to zero and then we're going to commit right now remember row count is how many rows were affected by this last transaction ker.row count and we're going to die if that doesn't insert doesn't work this is unlikely unless somehow we've ran out of disk drive or something and we're going to grab the friend id as the as the key the last row that was inserted we're only going to insert one row so it's basically the primary key of the row that we just inserted so if you look at this code right here it comes out the bottom one way or another with friend id successful rights friend id is either they're already in our database or they're not and if we insert them then we have it and so now this count new and count old is just so i can print out a nice printout now we are going to insert into the friend table which is called the follows table in this case from id and two id those are the those are the two outward outward pointing uh foreign keys and we have the id of the account that we are retrieving the friends of and then this particular friend and so we're inserting the connection from this person to that person and then we commit it we want to commit these again so that later selects when the loop goes back up later selects get all of that data that's going on okay so we do want to commit from time to time and then we close the cursor at the very end okay so let's run this and see what happens okay so python tw friends dot py oh of course i am a refugee from python 2. so i always forget to type python okay so we're going to start if we take a look right now i'm going to start another tab over here and ls minus l star sqlite now that sqlite file is there right and it's actually made the tables if you go up here it ran all this stuff create the tables yada yada and we're sitting right here at this line as a matter of fact i think without causing too much trouble i can open that database and get into this database right here and there is no data in the follows table and there is no data in the people table it's completely empty okay so we're waiting for the first one and i'll go with mine dr chuck so it's retrieving the 100 friends and they all were brand new they're all inserted right um and so now if i hit refresh we will see that dr chuck is retrieved um who follows so these are all the people i follow one follows two so if we look at here we see that dr chuck follows stephanie teasley because we grab the followers of dr chuck you know we're going to have a record in all of the follows for all the ones that i did right so these are all the people i followed and we put them in okay so we can go back and we can let's see grab somebody let's go grab stephanie teasley and let's pull out her friends so we grabbed a hundred of her folks i got 14 left that's my x rate limit so i did stephanie teasley so let's go back here so you'll notice there's 101 there's probably going to be oh 182. uh that's interesting so we've retrieved dr chuck and stephanie teasley and let's go take a look in the friends table the follows table okay so we have all of people i follow now all the people stephanie follows okay so there we go so let's go ahead and do somebody else um let's see i think we both follow tim mckay where's tim mckay yeah let's follow tim mckay let's see what who tim follows see if we can get like an overlap oh we revisited some let's see if we can see this in the follows let's see people so we've got dr chuck retrieved into my case somewhere down here yeah it might take us a while before we get any really good overlaps uh let's see let's do a database call let's see let's do a database sql select count okay so let's just run this some more it's clearly working now one thing i can do here is i can hit enter and it will just pick one randomly so it grabbed live edu tv and i can and let's see how many i got left we got 12 left and now i can hit enter again and it picks another one uh that was the next one i was kind of picking them in order is it picking them in order let's go to people yeah it's picking these so it's gonna we can see that it's going to just do the first unretrieved person who's nancy let's let it retrieve nancy so it grabbed nancy new so we're finding some and this table is getting really big and so if we look at the people table we now have 455 people and we have 467 following records and so there we go oops hit enter it does another one and away we go so you get the idea i can type quit to finish and just to give you a a a little interesting a bit of code to show you how to do selects i'm going to do this tw join now you'll notice that we're not talking oh let's show you one thing um ellis by nsl friends um star sqlite so this database has it so i can restart this process and run it again and the database is still there and so we just grab a swear trek um and so we can keep doing this and and so this data it keeps extending and so this is a restartable restartable process i can run it and then tell it to grab the next unretrieved one and so away we go right and um so that's part of it so so i can if i run out of my uh i've got eight left oh how many do i have left really let's keep going how many do i got left i got five left okay wait oh i guess we'll just run it out so i got four left you know what i should do is i should i can't change the code yes i can't change the code i can stop the code and i can quit the code so what i'm going to do is i'm going to change this code a little bit really quick and i'm going to print the headers of rate limiting at the beginning and at the end so now i can run it again i change the code hopefully didn't make a python error delta go get another one and a navarro and so i got three left oops we'll see what happens when i run out of rate limit run out of rate limit so we have one left hit enter hit ctrl k open source.org so we have zero left that worked now let's see what happens i don't know what happens next oh we blew up too many requests oh we got an http error 429 so that means that going for mark cuban uh that was in line 48 so the right thing to do would be in line 48 um we should really put this in a try try except blocked try accept block because it gives us an error uh print oh fiddlesticks how do i print the exception message i always am forgetting print failed to retrieve okay so we'll put that in now if i run it and then i have to put a break here because that's not a good break failed to retrieve not got to figure out oh see i never know how to print out the error message yeah so i have to i never rem see that's the weird thing about stuff is that i don't ever remember enough i don't remember the syntax what i say here to print the error message out so i'm going to go to google and i'm going to say print out the exception message in python print out the exception message in python oh python3 hello okay so let's go find it here in the documentation accept accept is this it is this what i say i just want to print out the message ah that's it except let's try this so this is part of python programming is like for me at least because i'm just not like a genius expert at this stuff [Laughter] this is one thing i like about python is you can guess stuff and sometimes you guess right so there we go we got the error we got the nice little error message and we see error 429 too many requests so that cleans that up nicely so we are we have run out of requests and on that it is a good uh good time to to say thanks for listening and uh i hope that you found this valuable [Music] so hello and welcome to our final chapter retrieving and visualizing data in this chapter we are going to basically bring this all together databases web services code loops logic and and we're going to solve a problem that is a multi-step data analysis we're going to find some data on the internet might be html might be an api or whatever and we're going to write a relatively slow process that's going to pull data slowly because these are all rate limited this is a slow and restartable process so you have to start this and what we're going to do is we're going to have a database that's going to hold the data that we're pulling and so this might take several days actually if you really have to do it and then you'll build up your data in your database and then what you tend to do is you tend to produce two databases one is kind of a raw database that you know is you really it's all of its data columns are aimed at helping you figure out what you've got to retrieve yet and what you haven't retrieved yet so that's kind of a crawling spidering process and then you find that the data is kind of nasty and ugly and you find that before you're going to do any analysis you probably want to clean and process it so you in a lot of these you're going to go from a raw database to a clean one and this is going to be really large and this is going to be really small and and you're going to do this sort of once but slowly and you'll do this as many times you need changing this program cleaning the data up over and over and over again and then you'll end up with really clean data and that's relatively small and you might run programs that loop through this to do visualizations or analysis or some things or whatever and so you'll actually sort of use this database as a source of information okay so that's the basic pattern of what we're going to work with now this is what i call personal data mining and if you're going to do this seriously python is used in lots of data mining activities but if you're going to do data mining seriously with really really large data sets we're doing uh small to medium-sized data sets as you might do sort of for individual personal research versus like an organizational research where you're processing the logs of a web server or something like that and there's lots and lots of wonderful technology and what's really cool is this technology just keeps getting better and better because the whole data mining data analysis uh natural language processing field is just so hot right now it's so awesome we're gonna keep it simple and do stuff for ourselves for now and um and and i gave you a bunch of sample code that's going to make it so that you can adapt this sample code to solve the problems that you need to solve so like i said this is more of a programming exercise data mining might be a lot more complex if you're doing simple research this might actually model what you do pretty well so the first thing that we're going to do is what's called use the google's uh json api for geocoding and there are two versions of this one version requires a key and one version doesn't require a key uh google used to make all this data available for free but with just a rate limit but now they're making increasingly requiring the key so i give you code in this zip file that kind of does both if you really want to do something in production of taking user entered places and names and getting precise latitude longitude coordinates so you can produce a nice little google map like this um and but if since google has made a rate limited api i've actually pre-spidered a copy of the google data and i have my own sort of fake google api and so you you can do your assignments and test all your code using my fake api um which has no rate limits and and has no problems but it's only a limited set of the data and so this is the basic process and it's it's one of those things that it's it follows that basic personal data modeling personal personal data mining pattern and so here's this api which is either google or me i've got my own dr chuck version of this doctortruck.net version of this and there is a an input queue of the location so this is the user data where they just put in the name of where they think they live university of uh to begin or something and um so this is the queue of the things that are to be retrieved and in in my case when i built this map for the first time there was like fifteen thousand and i it took me days to get this and so it would stop and so what i would do is i would you know read the first one into this geoload.py check to see if i already had it in my database if i didn't already had database i would go into the api pull the data down and i would put it in the database and then i'll go to the next one the next one the next one so you know i might get a thousand in my database and then it blows up or i'm told i can't go any further so i wait 24 hours i start it up and it reads the first thousand and says oh they're on the database already and then it starts at 1001 and then it adds that and adds that and then until it stops and so it took me several days of processing to get this data right now i didn't have a separate cleaning process because this data is pretty simple i was pulling out the the json and latitude and longitude etc and so i didn't have to do two separate processes to clean this data up it was clean enough right as i pulled it because i was talking to an api if you're talking to the html sometimes it gets nasty and ugly and so then i wrote this program that just reads through it it just does a select and you know reads through the stuff and it prints out some summary information and tells you what to do it also prints out you'll see this pattern because um you know i'm i'm visualizing using browsers html and this happens to be using the google maps api and putting all the data in a little javascript file so these end up being uh assignment statements in javascript you can take a look at that file and uh all the data shows up as assignment statements in the javascript and then when this html loads it reads this file and puts up all those pins as long as you have access to the the in browser uh javascript api so the next thing we're going to talk about is pagerank which is spidering now html we talked a lot about this spyder html get some links and so up next we're going to actually build a real database full featured search engine using pagerank [Music] so now we're going to write a search engine doing some of the things we're going to do page rank and we're going to visualize it in a in a web browser and show the weights we're really only going to do page rank on one page because you want to have links that more than one page that points to this to a page so that you can figure out which pages are more or less important and then visualize it we'll run the page rank algorithm and we'll separately do all this so at this point we're going to do pretty much the web crawling the index building and the searching we're not going to really search it we're going to visualize the index but you could write a simple program to do searches for keywords and figure out which page was the most likely page for a keyword and that that would be a fun additional thing to do so the web crawler is this program that hits hits a page pulls down the html parses the page looks for links makes a queue of incoming links that are as yet unretrieved and and i'm going to do this in a simple sqlite database and starts out with the database basically starts with one link as the starting point and then it retrieves that page and then you see the database end up with lots of unretrieved pages and it goes back in and picks a random page or retrieves that one and then it just expands and expands this code that i've built that you're going to play with is only stays on one website otherwise it would go crazy and but of course google doesn't use an sqlite database running on your hard drive but you get you'll get the idea you'll see this thing exponentially gain links and you'll run it for a while pull down thousand web pages or whatever but of course make sure that you uh don't violate any terms conditions and again i've got some data sources that you can use and they're not rate limited but you can also use things like wikipedia which i think they sort of discourage you or doctorchuck.com which has no rate limit or or who knows what right so so just be careful don't do this on facebook and don't do it on google don't get yourself in trouble and if you're using you know uh uh uh internet connection where you're paying for bandwidth uh be careful so this is the idea of the web crawler and this isn't my picture this is the classic picture of a web crawler read a page parse it take all the urls and stick them in a queue grab again and again so for us the scheduler is going to do it as long as you say oh do 100 pages or it runs until it blows up i mean and and again these processes that are have the network in the loop it's really important that they behave well when they blow up and that's why databases are so useful because you can be writing along to the database and some random thing happens in blosure blows your data up and you start over so you're reading these things you're storing each page building up your storage et cetera et cetera so you just keep on doing that and with this program you you'll be able to retrieve some stuff then run the page rank then you can retrieve them more and then you can run some more page rank and you can kind of see how google sort of evolves its index over time of course we're we're so much simpler and like i said be careful when you crawl um you're going to run a crawler that just goes as fast as it can um but google doesn't do that it's careful not to overwhelm any websites it's trying to be smart about the use of your bandwidth on your website there is a file our code won't bother looking at this but there's a file called robots.txt that real web crawlers look at and it gives a list of the things you're an art are allowed to look at and not allowed to look at and so if you go to google and you see a search it says we're not allowed to show you the summary text of this page because of the robots.txt it's there and you can go and you can actually see a robots.txt like on just go to any website it's at the top root blah blah blah blah blah robots.txt don't it's not a path it's not slash this slash that slash something else robots it's at the very very top of a website the index building uses the page rank algorithm and the whole goal of page rank algorithm is to figure out which pages have the most best links so having the most links is really easy you can just say how many links go to this but the problem is is you got to figure out the value of those links and then you have to how do you figure the value of those links by looking at how many good links come to it so it turns out that it's a an infinite problem it's an infinitely difficult problem to to use pagerank but you can approximate it and what happens is after a while it converges to a reasonable value and so we're going to run the search index and each time it runs you're going to see that it says you know how much did these numbers change and what happens is in the beginning they change very wildly but quickly they flatten out and it has the best way to think about uh the the page rank is think about how water runs where um you have a small little stream going by a house and sometimes it rains sometimes it's dry and sometimes you know and and there's like a little little lake and the stream is always running and it doesn't go up and it doesn't go down it might go up a little bit if it rains a lot but in general there's sort of a steady state meaning that whatever water is coming in is about the same as the water going out so we think about this in terms of web pages the the value of the links coming in is roughly the same as the value of links going out so when that starts to balance the in and the out value from each of the nodes then uh you've got a pretty stable and so what google does is they have a really relatively stable assessment of goodness and value of pages and they use that to commute page rank and then they throw a few more pages in and it kind of has to adjust for a while but it reconverges and so this is a calculation that generally converges um and it doesn't vary wildly and that's why it you know google's pretty good at kind of arriving at the true value of something so let's take a look at uh what we're going to do in this application again we have a a a file that is going to spider the web and we only have one database again in this one we'll have two databases in the next one and so this is spider is the restartable part and what we actually do is we we put one url in the starting url and then spider walks in and asks are there any unretrieved pages and it does that randomly and sort of picks among the unretrieved pages and says okay great i'll go retrieve that page and then i'll parse that page and then i'll put in a bunch of new unretrieved pages okay as well as the text of that page and then a bunch of unretrieved pages and then it'll go back up and it'll say oh give me one of the randomly non-retrieved pages and grab a next page and pull that page down and then add to it and so this is like there's a page and then a to-do list and then this one becomes a page and then adds a few more things to the to-do list and so the to-do list or the the unretrieved urls grows very rapidly um and the retrieved ones grow sort of as you retrieve them one at a time but you've always got this long list if you have a really short site that only has like two links if you start at doctorchuck.com page1.htm it'll go to page two and then go back to page one it'll be out of things it'll have retrieved all of the pages um and so if you have a website that has no external links or has very few pages and they point to each other this will run out of things to do but if you go to a page like my blog or the the code that i the the sample stuff that i have up for you to spider for testing on drchuck.net um it'll run for a very long time and you'll have far more pages to retrieve than pages that you retrieve but that's okay at some point you can stop this maybe it stops because you ran out of bandwidth or maybe your computer went down or who knows what right but it's okay this is a restartable process because it always has some pages that are retrieved and some unretrieved pages you start it back up it picks randomly from the unretrieved pages your database is the sort of persistent state of your spider rather than a bunch of dictionaries or lists inside the python which go away when the program dies and uh and so at some point you have let's just say a few hundred pages in here and a few thousand unretrieved pages you can run the page rank algorithm and what the page rank algorithm does is it loops through all the pages and figure out which pages are linked to which pages and then reads the numbers and then updates the numbers and then does that some number of times and so this is where the numbers all the pages sort of start out with goodness of one i think this printout is showing that goodness of one and then it changes and then the goodness goes to the sum of the goodness goes up to two some of the goes to seven and whatever but then it does this over and over and then it uses these numbers and then they change again and so there's a number of time steps that this page rank runs and you will see as the page rank runs when i show you the code you'll see the average sort of change in these numbers across all these things and you'll see that it the average goes down very rapidly as you get through and so usually with a few hundred or even thousand pages like a hundred plus times running this algorithm and these numbers have converged and that's when you sort of can begin to trust the numbers now there's this one program called sp reset which sets all the pages back to one so you can start this over so if you were to spider for a while run sp rank for a while play around and then you wanted to spider some more and start it over you could say oh let's start the page rank completely over or you could simply take the new pages and and and watch it adapt either way this is just a way to reset all the pages to have sort of their initial value of a goodness of 1.0 so at some point you run this this this runs really this part here runs really slow this part runs super fast like in the blink of an eye this one re is pretty fast and then at some point you've got these pages that have you know numbers on them they have values on the pages and there's a couple of programs that allow us to visualize that one is the dump which just reads it and checks to see it shows the the new page rank the old page rank um and various other things and shows just a way to dump it and then there's this thing that reads the whole thing you hum you say i'd like to do 25 the top the best it sorts it by page rank and then produces a javascript file it has just the the numbers in it and then there is some html and a visualization library called d3.js which you can read about that when the html starts it reads this and has this nice force direct layout of the page rank and you can hover over things and you can see uh what page rank you've got and so and so that is the page rank algorithm that we're going to do and up next we'll do the largest and most complex of these things and that is the email we're gonna spider some email which is about a gigabyte of data okay [Music] the last visualization application that we're going to take a look at is mailing list and that's kind of ironic we started with mailing lists and we're going to end with the mailing list the mailing lists of course are from my open source sakai project which i love and very proud of and and so uh what we're going to do is we're going to crawl the archive of a mailing list and then we're going to do two visualizations one is a activity visualization and another is a word cloud so um it's probably probably the more important thing is when i do the demonstration of how the software works so this is a large data set so you've got to be careful uh this could spider gmane.org which is a very free and friendly archive this data originally came from gmane.org but i've got a copy of it and so gmane.org is not rate limited but if everyone who is watching this starts spydering gmain.org at the same time you will crash it it just doesn't have the horsepower to give you this data as fast and so i've got something that can give you the data super fast and has no re rate limit on really good server and it's cached all around the world using a technology called cloudflare so please please please don't point this at gmane.org point this at the url here mbox.chuck.net etc etc and then you can run this as fast as you like now another thing to worry about is if you have a metered connection so don't do this on a cell phone connection because you'll pay thousands of dollars perhaps make sure you run a no-cost connection um before you start running this because this is going to pull a lot of data down if you just start this from scratch and you let it run it it on a super fast connection it the whole downloading the whole thing is probably about four hours on a on my home connection uh when i had like about a 10 megabit connection it took several days and so so just understand that in this one it's both fun to deal with a ton of data and it's scary to deal with a ton of data so this one is big this one is you'll see the process in action because it'll run for a while everything the things will take a long time so here's basically the flow of the data in this particular one you are going to have the restartable spider that talks to the api and box doctorchuck.net which has a scalable copy of all this information um and again it's going to do kind of a raw database not a very clean database it's sort of a mess it's just just enough columns to keep track of whether or not we've got this page or not and so so this has you know the ones we've retrieved so far and so what gmail does is it sort of scans down to see where to retrieve next gets that and then start scanning and then adding things here so it just adds it and then it blows up and then it comes in again and says okay i'll start here and then it starts retrieving stuff and fills this in fills this in fills us in and sometimes you put like a delay in this so you don't overwhelm networks or don't overwhelm servers but basically this is pretty much a raw retrieval of the email messages and this file can get rather large this is the one that's greater than a gigabyte now this data is actually really nasty it's email data the date formats changed this is data that lasted from 2004 to like 2012 or 13. um and so this just data is got a lot of things wrong with it it even has things where people's email is changed and so it has this mapping file this comes along with it this mapping file it says here's this one person and here are the six email addresses that they used throughout the life of the project and so there is a relatively complex and so this is this part here is super slow um very slow this part here is slow but it'll take like depending on how fast your computer is somewhere between two minutes and 10 minutes this will this first this first part will take days perhaps depending on the speed of your network connection and so what g model does is it reads through this it actually recreates it wipes this out and recreates index.sqlite every time it runs so you can change any number of things you can re-spider things you can do whatever and often the cleanup this is one of those cleanup processes and you have to tweak the cleanup process you like look at your data like oh the cleanup missed something so i've got to run it again so this produces index.sqlite every time it runs so this is like two to ten minutes um g model is two to ten minutes and it like maps names and when it's all said and done this is a very small highly normalized it's a nice data model this one here had the content sqlite has an ugly data model index.sqlite has a pretty data model it's got foreign keys it's got all the stuff and all those things we talked about in the database where it's efficient and so in your mind keep track of how fast it is to scan all the data in a database with a bad model and then watch when you run like g basic which is a scanner or g line which produces line date or g word and watch how fast they run they run in like a couple of seconds at the most and this runs in two to ten minutes and that and the difference is is that's because the data is efficiently modeled in index.sqlite so you can take a look at that using sqlitebrowser and take a look at the data model and you'll see it looks just like the stuff we talked about in the database chapter it's got foreign keys and and all those things and so that runs and you've got this and then we do our visualizations and our analysis from this clean version of all the data and so g basic just loops through and prints some stuff out it's a great way to test things it's a pretty easy to understand program and you could take a look at it g line does some bucketing and makes it hit some histograms to produce a line graph and then g word does a different histogram it does a histogram of word frequency and then produces that is the word frequency ends up in gword.js and then we have two html files that use the d3.js visualization to produce a line and a word chart and so you know i'll in a in another video i will show you how this code works which is probably more useful than this picture but this is a whole bunch of good stuff in this particular application and and if you really understand everything in here you can build a pretty sophisticated uh data retrieval and analysis pipeline and so so that's it i thank you for watching all these lectures and look forward to seeing you on the net [Music] so hello everybody welcome to python for everybody we are doing some code walkthroughs if you want to follow along with the code you can download the source code uh from python for everybody dot the python for everybody website okay so the code we're playing with today is tw friends.py and this is a step beyond the simple uh tw spider it is a restartable spider but we're going to data model things a little bit differently we're going to have two tables and we're going to have a uh a many-to-many relationship except that it's sort of a many-to-many relationship between the same table which is okay um friends is a twitter friends are a directional relationship and so uh so we've started out here in tw friends.py remember that the filehidden.py i'll show it to you but i'm not going to open it because i've got my keys and secrets in it so this hidden.py file you've got to edit that and you've got to go to apps.twitter.com and get your keys and put them in there otherwise these things won't work but if you have twitter and you set your api keys up and you put them in hidden.py then all these things will work it's kind of fun actually and impressive uh not hard to do actually so the twitter url that's my library that reads hidden.py and augments the url and does all the oauth stuff json and ssl because twitter doesn't i mean because python doesn't accept any certificates even if they're good certificates so we kind of crush that here's our friends list that we're going to hit we're going to make a database friends.sqlite now here we're doing create table if not exists so what this really is saying is i want this to be a restartable process and i don't want to lose the data we're starting out uh we do not have uh sqlite any sqlite files and so this is going to create the database and create these tables but the second time we run it we're not going to recreate the tables we're not gonna we're gonna be able to restart this because we're going to run out of um we're going to run out of rate limit before we finish this but so we just have to wait however long the rate the temp takes to reset and we'll watch the rate limit go down and so we're going to have a people table and we're going to have an id a primary key and the name the name is going to be unique and whether or not we've retrieved it and that's kind of from a previous one but then there's the who follows who um the from id to two id and so this is a direction and we're going to put a uniqueness constraint in just like we do in many to many's that basically says the combination of from id and 2id has got to be unique we don't allow ourselves to put duplicates of the combination so from id can be one in many records and two id can be one in many records but one one is only allowed once and this is the crud we have to do to convince python to accept the twitter uh certificate and so this is similar to some of the other stuff that we've done we're going to uh enter a twitter account or quit and if we hit enter by itself then we will actually go and retrieve a record that was not yet retrieved and now we're actually pulling out two values id and name and so we will we will grab fetch one is going to give us a two tuple basically and we're going to store that in id and account of course that's like this is this is coming back with a two tuple first of which is the id from the database limit one means we're only going to get one of these or zero of these if there are zero these that means there are no unretrieved twitter accounts retrieved equals zero well you'll see in a second that the all this new accounts we put in are the ones for which we haven't retrieved and again given that our rate limit we want to know which ones we've retrieved okay and um and so what we're going to do next is we're going to check to see if the person that we just checked which means the length of the account is great that we just were entered we're going to check to see if they're already there okay and we're going to select id from people where name equals so that's the one we just entered and we're going to fetch one and grab the first thing because we only we only got one thing in the select statement here um and if this person that we just asked to see is not in the table that means this is going to fail we're going to do an insert or ignore this or ignore is kind of redundant because we just checked to see if it was there but we'll put that in just to be safe and we're going to put the name in for as the new the new account that we're looking at uh and we're for indicating that retrieved is zero so that we will we will know that we haven't retrieved it yet you'll see that we'll update that in a second we commit it so that later selects will see this so that so you got to do the commit this later select wouldn't see the one we just inserted and we're going to ask how many rows were affected and if it's not equal to 1 then we're going to complain about we inserted it and we are going to do this thing we're going to ask hey remember there was an id up there right here id integer primary key and we did not insert this here but we want to know what that id is and every time i was showing you that in lectures i was saying it's really easy in python to do this and that's we're saying this cursor did the insert but one of the things happens is after the insert we're going to grab the last row id which is the primary key that was assigned by sql okay and so that means that one way or another coming through this code here in line 45 one way or another we're either going to know the id of the user that was there before or we just inserted one and so we're going to know the primary key of the current user and you'll see why we need that so id is the primary key of the current user that we entered right here okay and now we're going to do is do the twitter url augment with the oauth and all the keys and the secrets and hidden.py instead we're going to go through let's count 1000 let's go count what the heck let's go 200 up to 200 friends save now let's do 100. we'll keep it that way and then we're going to retrieve it and uh we're retrieving the account we're not going to print the nasty url out we could then we're going to open the url with the connection and then we're going to read that and we're going to get the utf-8 data from this and then we're going to decode that and we're going to have the unicode data so the data in string is a internal python string with all that data representing all the wonderful characters and of course we're going to ask url open to give us back the headers as a dictionary using this call and we can see what uh how many we have left for the remaining right what's the remaining rate limit that we have okay and so then we're going to parse the data with json load s if uh oh wait i need to continue in here continue okay save um if we're going to parse this data we'll print it out right so that means that this this died which means it's not syntactically correct json basically and who knows if we're ever going to see that but at least when it blows up it'll print this data out we'll have to catch it and then it'll continue actually i'll make this a break because if that's blowing up that bad we should quit now we don't i don't yet know what happens when this rate limit says you can't have it and so but i do know that i expect when it's successful that there will be a uh key of users in this outer dictionary that we're going to get and if this outer dictionary that we're go if we if users is not in the parse dictionary then i'm going to dump out this data so that at least i can debug what happens when i've got some broken json so the difference between this code this code is going to fail when the json is syntactically bad meaning a curly brace isn't right or whatever this code will trigger when i get good json but i don't have a user's key in it okay so then once we've retrieved it we're pretty happy with it we're gonna update for our account that we're retrieving we're gonna set this as one of our retrieved accounts okay and then what we're going to do is write a loop that goes through all the friends of this particular user that we're asking and gets their screen name prints it out and then we're going to check to see if this one is already in our people database because this is a spider we're grabbing accounts and uh and so we'll do a friend id and do a fetch one grab the subzero thing and if that works if if this person's not in there this fetch one is going to blow up which means we're going to drop down to the accept code but if it does work we have friend id is the you know that we they they're there and they're already in our database right they just weren't retrieved okay and so now if we the friend id wasn't there we're going to do an insert into setting retrieve to zero and then we're going to commit right now remember row count is how many rows were affected by this last transaction cur.row count and we're going to die if that it doesn't insert doesn't work this is unlikely unless somehow we've ran out of disk drive or something and we're going to grab the friend id as the as the key the last row that was inserted we're only going to insert one row so it's basically the primary key of the row that we just inserted so if you look at this code right here it comes out the bottom one way or another with friend id successful rights friend id is either they're already in our database or they're not and if we insert them then we have it and so now this count new and count old is just so i can print out a nice printout now we are going to insert into the friend table which is called the follows table in this case from id and two id those are the those are the two outward outward pointing uh foreign keys and we have the id of the account that we are retrieving the friends of and then this particular friend and so we're inserting the connection from this person to that person and then we commit it we want to commit these again so that later selects when the loop goes back up later selects get all of that data that's going on okay so we do want to commit from time to time and then we close the cursor at the very end okay so let's run this and see what happens okay so python tw friends dot p y o of course i am a refugee from python 2. so i always forget to type python 3. okay so we're going to start if we take a look right now i'm going to start another tab over here and ls minus l star sqlite now that sqlite file is there right and it's actually made the tables if you go up here it ran all this stuff create the tables yada yada and we're sitting right here at this line as a matter of fact i think without causing too much trouble i can open that database and get into this database right here and there is no data in the follows table and there is no data in the people table it's completely empty okay so we're waiting for the first one and i'll go with mine dr chuck so it's retrieving the 100 friends and they all were brand new they're all inserted right and so now if i hit refresh we will see that dr chuck is retrieved um who follows so these are all the people i follow one follows two so if we look at here we see that dr chuck follows stephanie teasley because we grabbed the followers of dr chuck you know we're going to have a record in all of the follows for all the ones that i did right so these are all the people i followed and we put them in okay so we can go back and we can let's see grab somebody let's go grab stephanie teasley and let's pull out her friends so we grabbed a hundred of her folks i got 14 left that's my x rate limit so i did stephanie teasley so let's go back here so you'll notice there's 101 there's probably going to be oh 182. uh that's interesting so we've retrieved dr chuck and stephanie teasley and let's go take a look in the friends table the follows table okay so we have all of people i follow now all the people stephanie follows okay so there we go so let's go ahead and do somebody else um let's see i think we both follow tim mckay where's tim mckay yeah let's follow tim mckay let's see what who tim follows see if we can get like an overlap oh we revisited some let's see if we can see this in the follows see people so we've got dr chuck retrieved and tim mccase somewhere down here yeah it might take us a while before we get any really good overlaps uh let's see let's do a database call let's see let's do a database sql select count okay so let's just run this some more it's clearly working now one thing i can do here is i can hit enter and it will just pick one randomly so it grabbed live edu tv and i can and let's see how many i got left we got 12 left and now i can hit enter again and it picks another one uh that was the next one i was kind of picking them in order is it picking them in order let's go to people yeah it's picking these so it's gonna we can see that it's gonna just do the first unretrieved person who's nancy let's let it retrieve nancy so it grabbed nancy new so we're finding some and this table's getting really big and so if we look at the people table we now have 455 people and we have 467 following records and so there we go oops hit enter it does another one and away we go so you get the idea i can type quit to finish um and just to give you a uh a little interesting um a bit of code to show you how to do selects i'm gonna do this tw join now you'll notice that we're not talking oh let's show you one thing um ellis by nacl friends star sqlite so this database has it so i can restart this process and run it again and the database is still there and so we just grab a swear trek um and so we can keep doing this and and so this data it keeps extending and so this is a restartable restartable process i can run it and then tell it to grab the next unretrieved one and so away we go right and um so that's part of it so so i can if i run out of my uh i've got eight left oh how many do i have left really let's keep going how many do i got left i got five left okay wait oh i guess we'll just run it out so i got four left you know what i should do is i should i can't change the code yes i can't change the code i can stop the code and i can quit the code so what i'm going to do is i'm going to change this code a little bit really quick and i'm going to print the headers of rate limiting at the beginning and at the end so now i can run it again i change the code hopefully didn't make a python error delta go get another one anna navarro and so i got three left oops we'll see what happens when i run out of rate limit run out of rate limit so we have one left hit enter hit ctrl k open source.org so we have zero left that worked now let's see what happens i don't know what happens next oh we blew up too many requests oh we got a http error 429. so that means that going for mark cuban that was in line 48 so the right thing to do would be in line 48 we should really put this in a try try except blocked try accept block because it gives us an error print oh fiddlesticks how do i print the exception message i always am forgetting print failed to retrieve okay so we'll put that in now if i run it oh and then i have to put a break here because that's not good break to retrieve not got to figure out oh i see i never know how to print out the error message yeah so i have to i never see that's the weird thing about stuff is that i don't ever remember enough i don't remember the syntax what i say here uh to print the error message out uh so i'm gonna go to google and i'm gonna say print out the exception message in python print out the exception message in python oh python3 hello okay so let's go find it here in the documentation accept accept is this it is this what i say i just want to print out the message ah that's it except let's try this so this is part of python programming is like for me at least because i'm just not like a genius expert at this stuff this is one thing i like about python is you can guess stuff and sometimes you guess right so there we go we got the error we got the nice little error message and we see error 429 too many requests so that cleans that up nicely [Music] so [Music] hello everybody welcome to python for everybody this is another worked code example uh you can download the sample code zip file if you want to follow along and the code that we're working on today is what i call the geodata code and that is uh code that is going to uh pull uh some some locations from this file uh we're simulating or using the google uh places api to look places up and so we can visualize them on a map and so this is the basic picture if we take a look at this where dot data file it's just a flat file that has a list of organizations and it's actually was pulled from one of my mooc surveys we just let people type in where they were went to school and this is just a sample of them so this data is read in by this program geoload.py and if you recall this google geodata has rate limits it also has api keys which we'll talk about in a bit too and so the idea is this is a restartable uh spider-like process and so we want to be able to run this and have it blow up and run it and start it and not lose what we've got right and so this is unlike some though so we're not now using a database as as well as an api but in order to work around the rate limits of this api we're going to use the database for the restartable process and then we'll make some sense of this and then we'll visualize this but in the short term let's start with geoload.py code geoload.py take a look here so a lot of this hopefully by now is somewhat familiar to you uh url lib json sqlite and so i mentioned that the google apis these used to be free and did not require an api key but increasingly they're making you do api keys for especially new ones so what happens you you can go to your google places and go to google apis and get uh get it like an api key and you can put it in here it'll be this long big long thing that looks like that and then if you have an api key you can use the places api and i've got a copy of a subset not all of it a subset of it here at this url as a matter of fact you can just go to this url in a browser and it will tell you a list of the data that knows about okay and um and and i made it so that that does the same basic protocol with uh the address look you know address equals uh as the google places api so this will just change how we retrieve the data either retrieve it from my server nice thing about my server it's got no rate limit it's really fast and you're not fighting with google all the time and it means that perhaps if you're in a country that google is not well supported you can use my api i mean that's really strange that somehow my api is more reliable and available than the google one but it's true so we're going to make a database we're going to do a create table if not exists and we'll have some address and we're really just caching the geogra geographical data we're going to cache the json one of the things we do when we build these processes is we tend to simplify these things and not do all the calculation and parsing the json just load it and get it in and load it and get it in and fill the data up in this database and so that's what we're going to do because python doesn't ship with any legitimate certificates we have to sort of ignore certificate errors we're going to open the file and um we're going to loop through it and pull out the address from the file and we are going to um select from the geodata where that address is the address let's move this in a bit and um and so we're going to do a select and pull out that address and uh the idea is is if it's already in the database we don't want to do it so we do fetch one and pull out that first thing which is the ag that will be the json right there if we get that we'll continue up otherwise we'll keep going uh pass just means don't blow up so we accept and we just do a pass that's like a no op and um we're going to make a dictionary because that's what we do for the um the key value pairs everything you've seen so far i've used constants here but because we may or may not have an api key query equals and then that's the address and then the key equals and then the api key if you recall url and code adds the pluses and question marks and all that nice stuff we're going to retrieve it we're going to read it and decode it print out how much data we've got and add account and then we're going to try to parse that json data and print it if something goes wrong and as we've seen that at this top level of this json data from this geocoding api is an object which we'll see a little bit of it in a bit and it has a status field in it and the status is okay if things went well um so if the status is not there that means our javascript is not well formed or not how we expect it if the status is not okay or not equal to zero results then print out failure to retrieve and then then quit and then we're simply going to insert this new data that we just put in and then we're going to commit it and every tenth one this is count mod 10 we're going to pause for five seconds and we can hit control c here and then we're going to play the do the geo dump okay so let's just run this geo data python so let's do an ls so we don't ha oh we do have let's get rid of from a previous test geodata.sqlite so we'll start with a fresh a fresh set of data and run python geoload dot py of course i'm always forever making the mistake of forgetting python 3. so you can see that it's running and it's adding the query and in this case i don't have the api key and it's putting the pluses in and that's this part here with all the pluses that's the url and code and you notice it's pausing a bit now depends on how fast your net connection this may or may not go so fast but this is not that much data so it should it's like only two thousand three thousand characters and so it's working and talking to my uh my server and the interesting thing here is i can blow this up i'm gonna hit control c uh in windows you'd hit control in linux you'd hit control c and then windows i think you'd hit control z depending on what shell you're working in but i'm gonna hit control c and you see i sort of blew it up right and that's it causes a traceback a keyboard keyboard interrupt traceback we do an ls minus l um you can see that now this geodata is there now in the in the name of restarting i will restart this and you will see that it checks and skips and so all it runs this code here where it's um [Music] right here it grabs it and finds it in the database so you'll see it say found in the database really quick chop chop chop and go really fast and then it'll go back to catching up where it left off and so all those up there they did not actually retrieve it because it knew about those things and so now it's catching up and doing some more and doing some more and doing some more um and then i'll hit ctrl c it has a little counter in here that basically if it hits 200 it stops and you have to restart it you could obviously change this code you could make it so it didn't sleep it doesn't hurt to sleep for like a second after every 100 or so if you want you could change that code um and now uh let's just hit control c and blow it up ls minus l um and there is another bit of code and this code it's always good to write these really simple things and so we're gonna now we're going to import sqlite and json we're going to connect ourselves up we're going to uh open except this is a utf-8 because it's a utf we're going to open this with utf-8 and um we're going to read through and in this case we are going to um decode we did select star from locations and if you recall locations has a a location and a geodata uh and so the sub zero will be the location and the sub one will be the uh the geodata and we're going to parse it convert it to a string and then parse it if something goes wrong with the json we'll just keep skipping it or check to see if we have the status in our json um let me run the sqlite browser here file open database let's take a look at what's in this database oh where are we code three geodata geodata sqli so this is our the data we've got so if you make this a little bigger if i can can i make that bigger yeah it's not going to show us much so you can see that these are the addresses in the geodata that's just the json so that's the json that we got and it retrieves it and so this is a really simple database that's just a sort of spidering process run run run but now we're going to run the geodump code which is going to read this and dump this stuff out and printwear.js so it's going to actually parse this stuff and that's code we've seen before so we're actually reading it and this line goes into the results it results as an array so if we go into results results in array we're going to go grab the zeroth item in that array and then we're going to go find geometry and then location and then latin long for the latitude and longitude and then we're also going to take the actual address out of the formatted address right here so in this in this bit of code we're actually parsing the json and we're going to clean things up get rid of some single quotes this kind of data cleaning is just stuff after you play with it for a while you realize oh my data is ugly or does this i'm going to print it out and then i'm going to write this out and i'm going to write it into a javascript file and so the javascript file is this where.js and this i'll show you what it looks like it's going to be overwritten this is the one that came out of the zip file it'll have the latitude the longitude and we're going to use javascript to read this in this where dot html file it's going to actually read this right there and pull that data in and that's how we're going to visualize i'm not going to go into great detail on how the visualization happens but that's what's happening and so we're going to write that so we're going to actually write this to a file so let's go ahead and run this code and say python 3 geodump okay so it wrote 120 records to where dot js so if we look at where dot js this is now the new data that i just downloaded moments ago and it says openwear.html in a browser now this way you'll need the google maps api and you might not be able to see this depending on where you're at but here you go with uh google maps locations and i think if you hover over this you can see and you see the utf why we there in that particular thing why we had to use the um utf-8 when we wrote the file so that we didn't end up with trouble writing the file out and so there you go and so that is a simple visualization and just a simple visualization wrote this where dot js if you are smart with html and javascript you can you can look at this where dot html file it's really just reading through a bunch of data and putting the points that's that's all there is but i'm not going to uh to go through that so at least not in this and so i i hope that this was useful to you and uh thanks for [Music] watching [Music] you
Info
Channel: My CS
Views: 26,044
Rating: undefined out of 5
Keywords: python, python tutorial, python language, python full course, python course, learn python, learn python programming, python tutorial for beginners, python tutorial 2021, python programming tutorial, python programming language, python for beginners, python crash course, python 2021, python tutorial for beginners full, python (programming language), python basics, python from scratch, python programming, getting started with python
Id: P3EKA_7CCFY
Channel Id: undefined
Length: 661min 4sec (39664 seconds)
Published: Sun Feb 28 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.