Data Analysis with Python for Excel Users - Full Course

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
If you already work with data in Excel, and  want to add more power to your data analysis   and evaluation using Python, then this is the  course for you. Frank is a data scientist,   and he will teach you how to use Python to work  with data. Hi, everyone, my name is Frank and   draw that. And this is my Python course for Excel  users. I created this course to help Excel users   move from Excel to Python. That why Python?  Well, in Python, we can do most of the things   we will do in Excel, such as working with data,  making charts, and pivot tables. But that's not   all. We can use all the power of Python  to automate tasks, work with large data,   and do lots of things. Thanks to the 1000s  of Free Libraries Python has on top of that,   Python can help you become a better data analyst  or get into new fields like data science,   I divided this Python course for Excel users in  three modules. In module one, I'll teach you all   the Python core concepts you need to know for data  analysis. Then in module two, we'll learn pandas,   pandas is a Python data analysis library that will  help us do most of the things, we can do an excel   in module three, we'll put into practice what we  learned in this course, by creating a pivot table   and visualizations such as line plots, bar plots,  and pie charts. Remember that in the description,   you will find the files code, as well as a free  PDF Python cheat sheet I created for this course.   There, you will find the concepts, methods and  functions we will see in this course. By the way,   I'm Frank, and I will be your instructor  in this course. So let's get started.   To download Anaconda, we go to anaconda.com and  click on get started. Then we choose their last   option download Anaconda is Tollers. And then we  have here that different Anaconda is taught. So   there are Windows, Mac, and Linux. So in my case,  I'm going to choose Mac, and I'm gonna choose the   64 bit graphical installer. So now I'm downloaded  Anaconda. And once it's downloaded, I'm gonna   click on it, and a message will pop up. Do you  just have to click on Allow us I'm going to do   right now. So just click on Allow and then  click on continue until the installation starts.   So I just click Continue and then agree and then  continue. And it's going to start installing   Anaconda. In case you're on Windows and you're  installing Python or Anaconda for the first time,   make sure to check the first box you see now  on screen. So I'm going to speed up the video   now. Okay, the installation is almost done. And  now it's telling me that Anaconda works with pi   term. And now I'm just going to click on Continue  to finish that installation. So I click Continue.   And then we'll see just a summary of what was  installed. And now I'm going to close this window   and I'm going to open Anaconda. So I'm going to  locate that icon, it's green icon, this one that   you see here. And I'm going to open Anaconda. I'm  going to wait a couple of seconds. And let's see   what was installed. So here we have that you put  your lab and Jupyter notebook, which are widely   used in data science. So I'm going to launch  Jupyter Notebook. So here it's opening Jupyter   Notebook. Let's give it a second. And now we open  a new notebook with python three. So python three   was installed to and that's it. In the following  videos, we'll learn how to use Jupyter Notebook.   In this video, I will introduce you to the  Jupyter notebook interface. Jupyter Notebook   is an open source web application that allows  us to create and share documents that contain   live code equations, visualizations,  and text. This is a perfect text editor   for doing data cleaning and transformation.  That visualization and data analysis this is why   Jupyter Notebook is widely used in data science  and also machine learning. As you might remember,   we installed Jupyter notebook in Python with the  Anaconda navigator in this means that we already   have installed some popular libraries used in  Python for data analysis. By the way, one of the   terms of Jupyter Notebook is Jupiter lab. Both are  similar, but we're going to use Jupyter notebook   in this course, because of its simplicity. So  let's open Jupyter Notebook. And to do that,   we have to click here on the launch button. So I  click here. Now we wait a couple of seconds Now we   have here the interface of Jupyter Notebook. So  I'm gonna maximize this. And by default Jupyter   Notebook opens the root directory of your  computer, it's a good idea to create a folder   where all your Python scripts will be located. In  my case, this folder is called Anaconda scripts.   So I click here. And now I can navigate through  the folders. And the folder I'm going to use for   this example is this one that says my course here,  we're going to create our first python script.   To do that, we click here on the New button. So  click here, and we have to click on the first   option that says, python three, there are other  options like text file folder, or the terminal,   but we're not going to use these options  in this course. So click on python three.   And now we have a Python script powered  by Jupyter Notebook. So here on the right,   you can see that it says python three, and also  there is the Python logo. And on the left, you   can see here the Jupyter Notebook logo, and also  the name of this Jupyter Notebook file, we can   change the name of the file by clicking here on  on title. So I click here, and I can change it to,   let's say, example. So I write example, in I click  on Rename, and now we rename these up or not file.   Alright, now let's navigate through this menu bar  that we have here in this Jupyter Notebook file.   So the first option is the file. In here, we can  create a new notebook with python three. So if we   click here, we're going to open a new Jupyter  Notebook file from scratch as we did before,   then we have the open and in this case, we can  open a Jupyter Notebook we created before we can   also make a copy to Jupyter Notebook and then  change the name, we can save a Jupyter Notebook   file and rename the file as we did before, we only  click here and rename the file, then we can save   all the progress we make in Jupyter Notebook.  For example, after writing many lines of code,   you can save all the progress you make by pressing  Ctrl S or Command S on Mac, and you're going to   create a checkpoint. And later you can revert to  a previous checkpoint by using this option here.   So here you will see many checkpoints and you  can revert to a previous checkpoint. By the way,   by default Jupyter Notebook makes saves  every third seconds or maybe one minute.   So there is no need to press Ctrl S every time. So  keep that in mind. Then we have other options that   I don't use so much like print this Jupyter  notebook or export that Jupyter Notebook   file to HTML or PDF and so on. Okay, now  let's see the second option that says Edit.   And here we can edit all the cells we have here  in this Jupyter Notebook. By the way, here,   what you see here on the screen is a cell. So we  can edit with this edit option. For example, we   can cat cells, we can copy cells paste cells above  and delete cells. On the right, you can see the   shortcuts that we're going to see on the next  video in detail. And well, you can check all the   edit options that you can perform on Jupyter  notebook here, then in the V option, we can   toggle the header, the toolbar and also line  numbers. So here, if I click on toggle header,   the header is going to disappear. And if I click  on toggle toolbar, this toolbar disappears to   also here and toggle line numbers we can show  here line numbers. So if I write anything,   we can see that it says 123 and so on. And  I'm not going to use this for this course,   I'm going to leave it with the default  options. So here I'm going to revert to the   original options. So without line numbers, and  I want to show the header and also the toolbar,   but you can personalize it as you want. Next in  the insert options, we can insert cells above   or below, we only click here. And well we're going  to see the shortcuts later in the next video.   Then we have the cell options, we can run cells or  run all the cells in this Jupyter Notebook file.   And then we have the kernel option. And a kernel  is a computational engine that executes the code   contained in a notebook document. When we open  Jupyter Notebook. A kernel is automatically   launched. And we can interrupt this kernel by  clicking here. So by interrupting we can pause   the execution of our code we can also restart  everything and do more things here. Sometimes,   for example, I interrupt the kernel when I line  of code or a cell takes too much time to execute.   And well you can do the same here with restart or  interrupt. Then we have the Navigate option that   doesn't actually have anything here, widgets  that I don't use so much and will help that,   I think it will send you to that documentation  of Jupyter Notebook. And you can read it if you   want. All right here. Then we have the toolbar,  and here you will find some shortcuts of the   menu bar that we've seen before. For example,  here, you can save and make checkpoint. So here I   click here. And as you can see, here, it says  checkpoint graded, or something like that, yeah,   checkpoint created and the time that he was  created, then you can here with this plus button,   insert, cell below sway click here, and as you  can see, we can insert a cell below. And also you   can use shortcuts, but that I'm going to show you  in the next video, then we can cap selected cells   with this button, we can copy a cell with this  bottom. And also we can pace sales below. Also,   we can move a cell above or below, for example,  I'm going to write anything here in this cell,   I can move it evolve with this button or below,  as you can see here, then we can run this code,   for example, I can write the number one, and  then run the code. And as you can see here,   the code ran and it shows the number one and well,  those are some of the frequently used buttons   in the toolbar. And that's everything you need to  know about this Jupyter Notebook file. Okay, now,   before finishing this video, I'm going to show  you some other options that you can find here in   the user notebook interface. In here, you can see  that there are some other options. So right now   we are in the Files tab. And we can change to  the running tab here. And here you can see all   the currently running Jupyter Notebook processes.  For example, we can see here that Jupyter Notebook   file we created and that we opened. So you can  recognize that you put your notebook file is open,   or that is running, because here the icon will be  in green. So here if we go back to the Files tab,   we can see that this Jupyter Notebook file, which  by the way has the IP y and b extension is in   green, so the icon is in green. So this indicates  that the file is running in well, it was opened.   So here we can see that is open, and we can shut  down this file. And this is different from closing   this file. For example, here I have the file.  And if I close this file, here, we can see that   file is still running. Here we see running  is in green, and in the running tab,   it still shows up. So if we want to shut down  this file, we click here. And it says that there   are not not books running. And we can see here  that the notebook has a great icon. Alright,   then we have the clusters tab and this tab I  don't use so much. And actually, it doesn't show   anything here. And then we have the NB Extensions  tab. Here, you can install any extension to   personalize Jupyter Notebook even more, and we're  going to see some cool Jupyter Notebook extensions   in the next videos. And by the way, this NBA  Extensions tab doesn't show up in some versions   of Jupyter Notebook, but we can easily install  it and we'll also see how to install these ennemi   extension step in the next videos. Finally, we  have this box that shows our directory. So here   this folder indicates that root directory. So if  I click here, we are not in the root. And if I   click on the folders, Anaconda script and then my  course I go to the folder where I was before. And   that's it. These are all the things you need to  know about the Jupyter notebook interface. Okay,   in this video, we're gonna see some cell types and  cell modes in Jupyter Notebook. So first, we're   going to open that Jupiter notebook file that we  created in the previous video, which is this one   example that I p y and b. So we click on it. And  here we have the Jupyter Notebook file opened.   In here by default, we have these four sold in  command mode. And we can say that this is command   mode because here this blue color indicates  that the cell is in command mode. And when we   are in command mode, we can do things outside  the scope of any individual cell. So basically   all the tools we see here in the toolbar, we can  apply it in command mode. Also in command mode,   we can apply some shortcuts that I'm going to show  you later. And for example, if we want to see the   shortcut window, we press the letter H in command  mode, and we can see the keyboard shortcuts here.   So here You can see all the shortcuts in all the  shortcuts that you can apply in commandment. Now   I'm going to close this one. And also you can  apply different shortcuts like for example,   if you press B in the command mode, you will  see that there is a new cell because B is the   shortcut that introduces a new cell below. Now,  if we press enter, you're going to see that the   color is going to change to green. So here we have  green color. And this green color indicates that   we are in Edit mode. And the edit mode is for  all the actions you will usually perform in the   context of the cell. For example, introducing  text or writing code. So here I can write,   say 123. So if I write 123, and then I click  on this run button, I'm going to run this cell.   And as you can see here, I run this first  cell. And also after running the cell,   you can see that we are again in command mode.  So to go to Edit Mode, we press Enter again, and   now we can edit the numbers we introduced. So for  example, I can write 456, and then run again. And   here you can see that the output shows 12345, and  six. By the way, if you try to use the shortcut   in edit mode, it won't work here, press enter. And  now I'm on edit mode. And if I press the nether H,   you can see that nothing happens, we don't have  the shortcut window. And if I press the letter B,   you can see that we don't insert any cell below.  This happens because those shortcuts work only on   command mode. So to escape this edit mode, we have  to press the Escape button. So press escape. And   now I'm again in command mode. So if I press H, we  have here that keyboard shortcut. And if I press   B, you can see that we inserted a new cell. And  that's it for the command in the edit mode. Now   we'll see the cell types in Jupyter notebook in  Jupyter Notebook. There are three main cell types.   And we can see all of them in this drop down here.  Right now the type of this cell is code. So here   it says code. But if we press here, you can see  other cell types like Markdown and row and B   convert. So we're gonna see first a code cell,  and it already has the check. So this one is a   code cell. So now I press here, and now well, it's  in code cell. If I press Enter, I'm in edit mode.   And here I can introduce any code I want. So here  I can write any number 99. If I press Control,   Enter, we can see that here, this is the input  in here we got the output of this code, we're   going to see how the code cell works throughout  this course. But now it's time to see how that   markdown cell works in Jupyter Notebook. So here,  I'm going to the cell. Now I'm going to change the   cell type. So I press here in the drop down.  And now I select markdown in the markdown cell,   we can introduce any type of text we want. For  example, we can introduce titles. So if I delete   this and press the hash sign, we can get title. So  one hash, it means title. So here I press a space   in now I write title. Now I press Ctrl, enter  or this run button to run the cell. In here,   we got the title. By the way, you shouldn't get  this one number because I use modify the default   behavior of Jupyter Notebook. So mine enumerates  the titles and subtitles, but in your case, you   will see only the word title. And if you want, you  can introduce also subtitles here. So for example,   I'm going to insert a new cell with this button  is plus button. And now I'm going to move this   cell up with this button here. So I press this  in now I'm going to change the cell type from   code cell to markdown cell. So I go to the  drop down and select markdown. And by the way,   you can change the cell type also with shortcuts.  So if you're in command mode, you can press the   Y button to change the code cell. So I press  the Y button. And as you can see here, it says   in and this in with square brackets indicates that  this is a code cell. So here I can press enter and   introduce any code here I introduce numbers and  press the Run button. And here you can see that   we have an input and an output. So this is a  code cell. But now we can press the M button   to make this cell a markdown cell. So now  we press M and here we are in command mode.   So now we can get this markdown cell in here. You  don't see that in Word with the square brackets   anymore. So now I'm going to edit mode so I  just press here or Well, you can press enter to   Go to Edit mode. In order to introduce a subtitle,  I'm gonna write double hash sign. So I press   hash sign twice. Now let's paste in. Now I'm  going to write a subtitle. So I write subtitle,   I press Ctrl, Enter, or the run button to  run the cell. And we've got here a subtitle.   And we can also introduce text, I'm going  to introduce a new cell with a plus button.   And you can also do it without beat shortcuts.  I'm going to do it with a B shortcut, right now,   I press B. And here I got this new cell.  And we can move this with this button here.   And now we have this cell in the position we want  it. So here, I can introduce text by converting   the cell to markdown. So here, I choose markdown.  Now you press Enter to go to Edit Mode. And here   I can introduce any text. For example, I can write  hello, I press Control, Enter. And now we can see   that we have here this text. And finally, the last  type of cell is that row and B convert. And this   type of cell is not ever loaded by the notebook  kernel. So if we convert this code cell to a row   cell, this cell won't be emulated by the notebook  kernel. So let's try here, I press row, and be   converted. Now we can see that this looks like a  plain cell. And well this type of cell is not used   that often, actually, we're going to use only  that code cell and a markdown cell in this course.   And that's it. In this video, you'll learn the  cell types and cell modes in Jupyter Notebook.   Okay, in this video, we're going to see  some common shortcuts used in Jupyter   Notebook. And we're going to start with the F  shortcut. And by the way, to use this shortcut,   you have to make sure you're in the command  mode and to verify during the command mode,   make sure that the cell has this blue  color. Okay, now during the command mode,   you can press the letter F, and you're going to  see these Find and Replace. So this first shortcut   allows us to find our word in a cell and then  replace it with another word. For example,   I can write here the word hello. And  here, it found the word hello, inside this   hello, world sentence. And now I can replace  this word with the world. Say hi, for example.   So here, I write Hi, in red, we can see the match.  And in green, we can see the word that we're going   to insert. So here, let's click on Replace all.  And now you can see that it doesn't say hello   world anymore. But now it says Hi, world. So now  I press Ctrl Enter, which is another shortcut to   run the cell. So you can press here and run  or only press Ctrl Enter to run this cell. So   press Control Enter. And now we ran this cell in  another way to run cells is to press shift, enter.   But in this case, we're going to run an insert  a new cell below. So now let's see I press Shift   Enter a note here, it ran the cell because  now test in n three inside square brackets.   In here, we can see that we have a new cell. Okay,  now another shortcut that is often used is the   y and m shortcut. So now this cell is a code  cell. And if we want to make this a markdown cell,   we only have to press the M letter, so we  press M and this is going to be converted to a   markdown cell. And if we press the letter y,  this is going to be converted to a Kotel and   also you can change the heading here, you can  make the heading bigger or smaller. So here,   I'm going to locate the cell A now to make  this one smaller, we can press the numbers.   So if we press the number two, we can see that  this one gets smaller. And if I press number   three, the title gets smaller for smaller and  so on. So as you can see the more hash signs,   the smaller the text. So here I'm going to delete  this hash signs. And one hash sign represents   the biggest phone size, which is the title. So  now we press Ctrl, enter, and now we have this   in heading one. But if I press number five, and  then press Control Enter, we can see that now this   cell has had in five and it's smaller. So now I'm  going to revert to heading one. So you press one,   and then Ctrl, enter. Okay, now we can navigate  through the cells by pressing on the up or down   keys on our keyboard. And as you can see here,  we can navigate through all the cells here or   we can also press with the mouse, we can press on  the cells we want. Okay, now we can insert a new   cell above by pressing the A key so if I press a  we get here a new cell above and if I press enter   b, we get a new cell below. Now if I press  x, we're going to Cat the cell. So I press X,   and you can see that the cell was Cat A. Now if we  press V, we paste that cell below. So I press V,   now we got the cell. And if I press Shift plus V,  we get the cell pasted above. So I press shift in   V, and we get this new cell above this cell I have  here, okay, now I can delete cells by pressing   D twice. So impressed the two times. And as you  can see here, that title disappeared. So now   it tried again, and we don't have the title  anymore. But now if we press the letter Z,   we can Undo those changes. So let's undo what we  did before. I press Z, and we get here, the title   back. Okay, another useful shortcut is ctrl S,  that allows us to save the changes we made in this   Jupyter Notebook file. So I press Ctrl S, and you  can see here that says, checkpoint created. So I'm   going to press again Ctrl S, and here it says  checkpoint created in here also says the time   and it says these are some of the most common  shortcuts used in Jupyter Notebook. But you can   see other shortcuts by pressing the letter H.  So press H. And here you can see more keyboard   shortcuts. Or you can also go here to help and  then go to keyboard shortcuts here, and you get   the same window. So here you can see a list of  shortcuts for command mode. And also for the edit   mode, you can see the description of a shortcut,  and also how to do it in your operating system.   One of the typical ways to get started  with a programming language like Python,   is printing a simple message, you can write any  message you want. But it's traditional among   coders to start with a Hello World. So let's try  it. Let's print our first message using the print   function. The print function prints a message to  the screen. So I'm going to write here, print.   And then I'm going to open parenthesis,  every time we use a function. In Python,   we have to open parenthesis, well, in this case  for the print function. And as you can see, here,   the functions get green color in Jupyter  Notebook. So that's how you can identify them. So   inside these parentheses, I'm going to write the  message. So in this case, it's going to be Hello,   world. So this is our first message.  Now, to execute this first line of code,   we have to press Ctrl N, Enter, or command and  enter if you're on Mac. So I'm going to press   this. And as you can see, here, we have our first  hello world. Another way to run this first cell   is pressing here on the run button is going to  have that same effect. So I pressed and it rang.   So as you can see here, it says in which  represents a code cell. And this is a markdown   cell, as we've seen before, one of the advantages  that Jupyter notebook has is that it allows us to   print the last object in a code cell without  specifying the print function. So for example,   here, I can print this Hello World with without  writing this print function. So I'm going to copy   this Hello World message that it's inside quotes.  And I'm gonna run this code. So just Ctrl, Enter.   And as you can see, here, we have this message  printed. So this is one of the advantages that has   up or not, if you do this in another Python  ID, it will work. So here you can try yourself,   you can write any message you want. Apart from  the first hello world, you can try with your name.   So we write prayer and then parentheses, and we  open quotes, because we need to define a string.   I'm going to tell you about strings a little bit  later. But yes, so you know right now. And here,   for example, I can write my name. So my  name is Frank, and I can print my name, then   I can print also numbers. So I print my age 26.  And it's gonna work too. And besides writing code,   you can also add comments, comments are a useful  way to describe what we're doing in our code. So   here, we can use comments. We just have to write  their hash sign, which is this one. So you write   hash sign in, then you write the comment. In this  case, I'm gonna write my name. And I'm going to   say printing my name so we know what our code is  doing here in the front. message we wrote, We can   also add a comment. So we write hash sign. And  then we can say printing my first message. As you   can see here, the comments also have a different  colors. So, so far, we have three colors, this   color for their comments, green color, for  God functions in red color for the string,   this is just a useful functionality most texts  a to have, that allows us to easily read code.   Okay, now let's see some data types in  Python. Every volume in Python is an object,   an object has different data types. Let's  see the most common data types in Python.   So one of the most common is that the types in  Python are integer and floats. Both are numbers.   But integers are numbers that can be written  without our fractional component, just like,   for example, the number one, number 2345, and so  on. So all of them are integers. And we can check   these value or this data type by using  the type function. So this is our second   function we're going to see so we  write type, and then parentheses,   and we execute, we run this code. And as  you can see here, in the output, it says,   I n t, which represents integer, so this is an  integer. Okay, the second type of data I want to   show you is float. Floats are numbers that contain  floating decimal points. So basically 2.3, let's   say 1.25 5.4, and so on. So here, we have another  type of data. And let's check out if these are   actually floats. So we use type, and then  parentheses, and we run this code. And we say   that we have float. And just like on Excel, you  can perform math operations in Python using these   numbers. So some operations, you can use our  addition, for example, you can say one plus two,   and then execute this code, and you get three, you  can use subtraction, so four minus one execute,   and you run this code and you get three.  You can also do multiplication, division,   exponents in more in Python. But now let's see the  third data type that we will see often on Python,   and it's the Boolean, Boolean are true or false  values. And we can check this using again,   that type function, and we write type. And  within parentheses, we write for example,   true. And we run this code and we see that we got  that bool, which represent a boolean data type. So   we can also write type, and in this case, false,  and run this code, and we get bool. Again, so this   is Boolean. And we're going to use Boolean,  often when we use conditionals. Okay, now the   fourth data type I want to show you and it's very  common is the string. A string represents a series   of characters. And in Python, anything inside  quotes, either single quotes or double quotes, is   a string. So let's see them actually, we already  see one kind of string here when we printed this   Hello, world. And you're actually familiar with  this, but we're going to see it again. So to   create a string, we have to open either single or  double quotes. So in this case, I'm going to use   double quotes. So you see it now. And now I'm  going to write any message. So I'm going to write,   for example, again, hello world. And again, to  verify the type, we can use that type function,   parentheses, run this code, and we get the STR  that represents a string. And one cool thing   a string has is methods, we can apply different  functions to strings, as we will do in Microsoft   Excel, for example. However, in Python, we use  methods a method is a function that belongs to   an object. To call a method, we use the dot  sign after the object. Let's see some string   methods to change the case of text. So here,  I'm gonna write again, hello world. But now I'm   going to use some string methods. So write hello  world. In this case, I'm going to use the upper   method to make this uppercase, so I'm going to  use the print function. But actually, we don't   need to use the print function because as I told  you before, and Jupyter Notebook, we don't need   to use the print, because it automatically prints  the last line of code. So since this is the only   line of code in this cell block, it's going  to print it automatically. So we just run this   cell. And we have hello world in upper case. So  as you might expect, now, we can also change the   case of the text. In this case, it can be  on lower case, or title case. So I'm gonna   just copy and paste this twice. In here, I'm going  to write instead of upper, I'm going to use flour,   and then title. So you can see how it's going  to change the case. So here, I'm going to run,   and let's see what happens. So as you can  see, here, it only printed the last one,   because I told you before, it only prints the last  one. And if we want to print the three of them,   we have two options. So we can maybe here cut and  paste on each cell. Or what we can do is to print   each of them. So here, for example, I can do  print here, and I can do the same for them.   So instead of using more cells,  we can print all of them.   And here, we can print this one too. Actually, we  don't need them, we don't need it, because it's   going to print the last line. But just for the  sake of this video, I'm going to print the three   of them. So here, I'm gonna run this code. And as  you can see here, the first it has an uppercase,   the second has lowercase. And the third has a  title case. So that's how you do it on Python,   other string method that you can find Python is  the count method. So I'm going to delete this,   and actually this one too. And we're going to see  this now. So first, I copy this. And now I paste   it here. And here, I'm going to use the count. So  the count method, so I write count. And then here   I open single quotes, and I write the letter  that we want to count. So here, for example,   I'm going to write that l letter. And what  this string method is going to do is going   to count how many times these l letter is  included in this string. So as we can see,   there are two L's, so it should set two times. So  I run these code, and actually is three because   there are two in kilo and one in world. So I was  wrong. And here, another string method that you   can use is the replace method. So we can replace  one letter for another. So here, let me copy this,   and I'm going to paste it here. And instead  of writing count, I can write replaced.   So here, the first letter that we're going to  see here is the letter that we want to replace.   So in this case, I'm going to change the L with  O. And the second letter is the letter that you   want to put in that string. So I'm going to use  the U. So I'm going to change every time that   an O appears here in the string, we're going to  replace it for you vowel. So let's try it. So   I run this code. And now it says, Well, hello  world, but with you. And these are some of the   most common string methods in Python. Okay, now  it's time to learn something that you're gonna see   often in Python, which are variables, variables  help us store data values. In Python, we often   work with data. So variables are useful to manage  this data properly. A variable contains a value,   which is the information associated with a  variable to assign a value to a variable,   we use that equal sign. So let's create  a message that says, I'm learning Python,   and stored in a variable called message underscore  one. So here, I write message underscore one. And   we set it to that is string. I'm learning Python.  So are you open double quotes in here I Right, I'm   learning Python. So this is string. We've  seen this before. And this is the variable,   and we assign this value to the variable  using the equal sign. Now I'm going to run   this. And as you can see, nothing happens.  But actually, we just assigned that string   to the variable message underscore one. Now, if we  want to obtain the message, I'm learning Python,   we only have to type the variable name, and then  execute that code. So I'm gonna copy and paste it   here. And then we run this code. And as you can  see, by running this cell, we obtain the content   inside the variable message underscore one, we  can create as many variables as we want, just make   sure to sign different names to new variables. So  let's create a new message that says, It's fine   and stored in a variable called message underscore  two. So first, I write message. So Ms search   and underscore two, and then we set this equal to   open double quotes, and right, and it's fun.  This is my second variable, and I'm gonna run   this cell. So as we can see, the string was  assigned to the second variable. And if I   copy and paste this variable here and run this  code, we can see that the message it's there. By   the way, if you're using single quotes, instead  of double quotes, or some using in this video,   probably you have the following issue.  So here, I'm going to copy this one   and paste it here so you can see what I'm talking  about. So let's say you're let's say you're using   single quotes, instead of double quotes. So you  get this, this is a problem that you will have   when using single quotes. Because in the  English language, we use this apostrophes often.   So a simple way to deal with this is using  double quotes. So as you can see here, if I use   double quotes, everything is okay. Everything  remains as a string. But with single quotes,   it doesn't happen. So only the i gets this  string by dress, it doesn't get a string value,   or the string datatype. So just make sure you  use double quotes every time you have these   apostrophes, and that's it. Okay, now, let's put  these two messages together. So message one with   message two, I want to put them together.  So this is called a string concatenation.   If we want to put message one, in message two,   together, we can use the plus operator. And we can  just do this. So I'm going to copy message one,   or the variable message one. And now I'm going  to copy the variable message underscore two.   And I use the plus in the middle to concatenate  this first message with this second message.   So run it, let's see what happens. So here, we  can see that the two messages were concatenated.   But here, there isn't a space between these two  messages. So this is the first message and this   is the second and there isn't any blank space  in the middle. So what we can do here is to   just add a blank space. So I'm going to copy  this one and paste it here and show you how to   do it. So here I add a new plus operator in the  middle, we open A string. So with single quotes,   or double quotes, in this case, I'm going to use  single quotes here, integrate this blank space,   I'm going to press a space. And here we have  our blank space here. And then we run this code.   And now let's see. And here as we can see,  there's a space between Python and that. And   we have this blank space. And we want we can  assign this new message to a new variable. So   I'm gonna assign this to a variable called  message. And I write message here. And I   include here below the code in here, I can print  this so as you can see, if I run this, we can   see that the message is there. Okay, now let me  show you an alternative way to join two strings.   So this is called the F string, and it works  like this. You write F and you open A string, so   we write a single quotes here. So one and two in  here. As you can see the whole, the whole thing is   red. So it's like everything is a string in here  inside, we can write the message. So let's see,   let's say we write a simple Hello World. So hello,  world. And we run this. And as you can see here,   this is a string, it just has this F, in front of  that string. In here, one of the advantages that   this f string has is that it can have variables  inside the string. So here, for example, we can   write a variable opening these curly braces. So  these collaborations can have variables inside it.   So here, I can write message, underscore  one, and we can print it. So if we print,   we have this string, I'm learning Python a now  if we want to concatenate this first message with   our second message, we just have to include  curly braces, again, I put it here. And now   I write message two. And between message one  in message two, I just have to press pace.   And we have this. So I'm learning Python, and  it's fun. So here, we just press is pace. And this   pace also appears here. So for example, if we add  some random text, let's say ABC, we get this ABC,   between Python in between. So this is how f  string works, do you just have to write the F,   then open single quotes, and inside you can write  any message. And to include any variable, just   you have to open these curly braces, write the  variable name, and that's how to join strings.   Okay, now it's time to see a data type  that is used. Often in data analysis,   I'm talking about this. In Python lists are used  to store multiple items in a single variable   list are order and mutable containers. In Python,  we call mutable, two objects that can change their   values, that is, elements within LA's can change  their values to great Alice, we have to introduce   that element inside the square brackets separated  by commas. So let's create our first list.   First, we have to set the name of the list.  In this case, I'm going to name it countries.   And now to create the list, we have to open  square brackets SAS said before. So here,   we open square brackets. And here we have  to write the elements. So I'm gonna include   in these countries list just strings, and they're  going to be names of countries. So the first one,   I'm going to write the United States. So  this is the first element in my list. And to   write the second, we have to use the comment. So  here, comma, and now the second. So let's write   India, tomorrow. So now China, and finally  Brazil. So these are the four countries,   as you can see here, these are lists. So we have  the square brackets that represent the list.   And we have four strings. And this is how we  define or how you create a list. So now I'm going   to run this one. And to see the content, I'm going  to paste the name of this list in now I run here,   I include only strings. But keep in mind that  lists can have elements of different types.   So for example, one string and the other  and integer, and then a float, and so on.   And also lists can have duplicated elements.  So for example, I can have here, United States,   written twice. So here, for example, I can write  United States, twice m, that's okay, because this   can have duplicate elements, but I don't want  it that way. So I'm gonna delete it and leave it   as it is. Okay. Now, if we want to get an element  inside this list, we have to use something called   indexing. By indexing, we can obtain an element by  its position. So each item in a list has an index,   which is the position in the list. Python uses  zero based indexing, that is the first element   so United States has an index zero, the second So  India has an index one, and so on. To access an   element by its index, we need to use the square  brackets again. So let's see some examples.   Let's start by getting the first element.  So United States. So what we have to do is   to write the name of the list, in this case  countries, and then open square brackets,   in inside square brackets, we have to write the  position of this element. So it starts with zero,   so we write zero to get that first element.  And then we run this code. And as you can see,   we get the first element. So if we write  here, countries square brackets, one,   we get India. And if we write countries square  brackets to we get China, and we do this,   with the number three, we get Brazil. So to  verify this, I'm gonna print each of them.   So let's see what happens. So here print. And  finally print this one. In now I'm going to run   and we shall get each element of the list from the  United States to Brazil. So let's try out. So here   we have each of them, United States, the first  one, then India, then China, and then Brazil. So   it's correct. So this is the most common way to  use indexing, but there is also negative index,   this helps us get elements is starting  on the last position of the list. So   instead of using indexes from zero and above,  we'll use indexes from minus one and below.   So let's get the last element of the list.  But now using a negative index, so we want   to get the last element which is Brazil. And  we did it before with countries square brackets   three. But now we're going to do it with negative  indexing. So here, I'm going to write countries   and copy and paste it here. And now I open  square brackets. And instead of writing three,   we're going to write minus one. And these minus  one represents the first element is Talend. From   the last position to Brazil will be minus one,  China is minus two, India minus three United   States minus four. And that's how it works. So I'm  going to run this one countries, square brackets,   minus one, and we will get Brazil and we got it.  So let's do this one more time. And in this case,   I want to get United States, which is minus 123,  and four, so it's countries minus four. So we run   this and we got United States, but now using a  negative index. Okay, now let's see something   called as slicing is slicing means accessing parts  of the list, as lies is a subset of list elements   is slice notation takes the form of a list.  So the list name, and then a square brackets   and this tart, then this colon and stop this is  Todd represents the index of the first element in   his top represents that element to stop at without  including it in the slides. So let's see some   examples. So I'm going to use this country's list  again, and use I'm going to copy this one, and   I'm going to paste it here. So this is the name  of my list. And now I open square brackets. And   we're going to get, let's say, we're going to  talk at position number zero, and then column   and let's get from zero to position number two,  so we have to write three, because it stops   at three without including these elements in the  position number three. So let's run this one. And   as you can see, here, we have index zero, index  one and index two. So it didn't include index   number three, you know, let's say we want just  the first element, so we write from zero to one.   So it's only zero and one no, because it doesn't  include one, and it's topped at one. So here I   run, and we got only United States. So now let's  try something different. Let's say we want to get   elements from index one to the last one. So let's  say let me see here. We want to get from India to   Brazil. So it's one two and three. So we have to  write four because it stops at four and we got   three. So let's write here 124 English  we'll get Yeah, India, China, and Brazil. So   this is one way to do it. But another  way to do it is just delete this and   leave it as it is and then run the code. And  as we can see, we got the same result. So   every time you want to get from one position  to the last one, you can omit the top element,   and just leave it without that element.  So just as we did here, and the same   goes for the start. So let's say we want to get  from the first position, so index zero to two. So   we don't include that start element, and we write  only colon, and two. So we're on this, and we get   United States. And then we get India, because this  is the first and this is the second. So every time   we want to get from the first element, or into the  last element, we can omit that target and its top   elements, as we did in these two examples. Okay,  now let's see how we can add elements to a list.   There are different methods that help us add  a new element two lists. So let's have a look.   The first one is called append. And we're going  to use that counters list as an example. So I'm   going to write countries just so you can remember.  And here it's countries. And as you can see,   it has four elements. And let's say we want  to add any country to this country's list. So   what we can do is just right here, or paste  here, countries, you know, add, append, or that   append in here, as you can see is this method. So  inside parentheses, we can write the new country,   we want to add to this list. So let's  say we want to add the country Canada.   So write Canada. And now we'll run this  code. As you can see, nothing is printed,   but it will print the counters list again, we  see here a new element. So as you can see here,   that append method adds a new element at the end  of the list. So this is by default at the end. But   what happens if you want to add an element in a  different position. So here, you can use another   method, which is called that insert method. So  let me show you here, I'm gonna copy countries,   you know, I'm going to use the Insert method. So I  write that insert, then parentheses, and this one   accepts two arguments. The first one is the index.  So the position of the element do want to insert.   So let's say we want these are the first position.  And the second argument that it takes is the   new element do want to add. So in this case,  let's say we want to add that Elements pane,   so these, another country, and it's going  to be in the first position, so index zero.   So let's try I run this one. And again, nothing  happens, apparently, nothing happens. And here,   if I run this country's list, again, we can see  that there is a new element, and this element is   pain. It's located in the first position. Unlike  Canada, that was placed in the last position.   This is one of the difference between the append  method and the insert method. So with insert,   we can specify the position, we want to  insert this new element, but with append,   the element is added at the last  position. Another thing you can do is to   join two lists, using the plus operator would use  the task operator to concatenate strings before   but you can also join two lists. So let me show  you here. I'm going to create a new list just to   show you how it works. So my new list is going to  be called countries underscore two. So I'm gonna   include different countries. So in this case, it's  going to be the UK, then Germany am. That's right,   Austria. So we have three countries in this new  list. And now I'm going to run this one. And   if we want to concatenate these first  list countries, with the second list,   countries to We can use the plus  operator. So here, I write plus.   And then I run this one. And as you can see,  I got five elements from the first list.   And three elements from the second list in  another cool thing you can do in Python is   putting these two lists inside another list,  which is called nested list. So let's try out.   So here, I'm gonna create a new list, it's  gonna be called nested underscore list.   In here, I'm going to open square brackets  to create a new list. And as elements, I'm   going to write countries, which is my first list,  and then comma, and then countries underscore two.   And this is my second list. So as you can  see here, these elements inside this list,   the first is a list in the second is the list.  So we have lists inside another list, which is   called a nested list. So I run this one, and then  I paste nested underscore list, and we run and we   get here. The first is as first element and the  second list as second element, you won't see these   nested lists so often, but you will encounter this  a couple of times, so it's good for you to know.   So now we're going to say the opposite of  adding an element to a list, which is removing   an element. So here, I guess, pasted the country  slate we had before. And what we're going to do is   to remove some of the elements of this list. So  there are different methods that help us remove   an element from the list. One of them is the  remove method. So to remove an element using this,   we have to first write the name of the list, and  then use that that sign and then write remove,   and write parentheses in inside here, we have to  write the element we want to get rid of. So first,   it's United States. So write United States.  And let's run this one. And as you can see,   apparently, nothing happens. But if  we paste countries, here, we have   all the elements, but United States is not there.  So as you can see, the first matching value   was removed. But you can also remove an element  by its index. So this is accomplished without   pop methods. So I'm going to copy all of this.  And now I'm going to paste it here. So instead of   writing that, remove, I'm gonna write that pop in  here, I'm not gonna use the name of the element,   but its index. So I write the index. In this case,  let's remove the last one. So it's going to be   index minus one. And what pop is going to do  is to remove that element with index minus one,   and then returns this element. So this element is  Canada, I didn't run this code here, so you can   ignore it. So I'm going to come in this one. And  our reference is going to be this this list. And   to verify we use write countries, and then run,  and here, as you can see, there isn't Canada   anymore. And that's how you remove an element  using the pop method. But there's still another   way to remove an item using an a specific index.  And it's that Dell. So I'm going to show you here,   del, it's the function del function. And here, we  have to write the countries list. And then again,   open square brackets in here, write that index.  So I write here, the index. And unlike the pop   method, we're not going to get the name of the  element we're getting rid of, but just deleting   the element. So I run this one. And here, we  didn't get anything. And I'm gonna print this.   So countries and that element at index zero was  removed. So it's pain because that's the first   element so we delete it or we remove the first  element. So we only got India, China and Brazil.   And there you have it three different ways  to remove an element from a list. Okay,   now let's see how to sort a list. We can easily  solve a list using the stock method. Let's create   a new list called numbers. And then sorted from  the smallest to the largest number. So here first   I write numbers, and then open square brackets. So  I'm going to write some random numbers. So force   four, then three, then 10, then seven, one, and  then two. So this is my list. So I run this code.   And now to sort it from the smallest to the  largest number, we write numbers, then sort,   then open parentheses. And by default, this  is going to be sorted from the smallest to   the largest number. So I run numbers again, in  here, it starts with one, and it ends with 10.   And as you can see, it's from the smallest to the  largest number. So that's the default behavior   of the SOC method. But we can control how this  works. So we can add that reverse argument to the   SOAR method to control the order. So if we want it  to be descendant, we set reverse to true. So here,   again, I'm going to create again, the numbers  list, and then write numbers. That sort   in inside parenthesis, I write the reverse  argument in, I'm going to set it to true here.   And then I'm gonna print numbers. So here, I get  an error, because here it I wrote number and its   numbers. So here, I'm going to add the s, and here  s two, so run again. And here we have, from the   end here, we see that the list is sorted from  the largest number to the smallest number.   So as you can see, the default behavior of this  sort method is reverse equal to false. So you can   control it here, by writing reverse equal to true  as we did here, okay, now let's see how we can   update values. And always, to update a value on  a list, we use indexing to okay, that element we   want to update, and then we set it to a new value  using that equal sign. So let's say we want to   update the first element of this numbers  list. So now it's four, but we want it to be,   let's say, 1000. So we write here numbers.  And we use indexing. So we write numbers,   the first element has index zero, so we write  numbers of square brackets than zero, then we   set it equal to the new value we want to include.  So in this case, I'm going to write 1,000th.   And now I'm going to print the numbers,  please, to see the results. So run this   one. And as you can see, here, the number of  leads we got is from the last change we made,   so the one that's taught with 10. So it's not  this one, but this one because it's the last   one we ran. So instead of 10, we replace this one  with 1000, because this is the first element with   index zero. So with ID, numbers, square bracket  zero, and we update that first element with 1000.   Okay, finally, we can make copies of the list we  created. So there are different options to create   a copy of a list. One of them is that slicing  technique. So as you might remember, to do   slicing, we have first to write the name of  that list, which in this case, is countries.   And then we open square brackets, then  we're supposed to write the start and   stop. So in this case, we're not going  to write start in a stop but only column.   So if we don't write start in, we don't write  stop, it means we want the whole list. So   let's try this out. I'm going to run this  one. And as you can see, here, we got the   whole list. So the counter sleaze doesn't have  the original values, because of the changes we   made when we added and remove elements. So I'm  going to pace the original counters list with   four original values that are the United States,  India, China and Brazil. And here let's see the   changes in how we test it out. In as you can see,  we got the whole list. So from the first element   United States to the last element Brazil, because  we're slicing the whole list. So if we write here,   new underscore list, and we set this  equal to countries with this slicing,   what is going to happen This new list is going  to have the same values as the country list.   So I write here new list. And as you can see  here, it has the same values. So recreated copy of   that counters list. So this is one way how  you can create a copy. And the second way is   more straightforward, or is it more explicit,  so is using the copy method. So we write,   again, countries the name of the list, and  then we use the copy method. So write, copy,   and then parentheses. So with this, we create  a copy of this list. So let's run this code.   And as you can see here, it returns the  list. But if we assign these to a new list,   we're going to create a copy. So here, I'm going  to write new underscore list underscore two.   So here, we assign this copy to this new list. So  I'm going to copy this new list and paste here.   And as you can see, here, we have the values  of this list, which are the same as the   original countries list that is here. And  that's it. That's how you make a copy of a list.   So now let's see how dictionaries work in Python.  In Python, a dictionary is an unordered collection   of items used to store data values, and a  dictionary contains a key and a value. So   this is what you will often see in a dictionary.  So here, for example, the name of my dictionary is   my underscore dict. And to create this dictionary,  we have to use these curly braces. So we open   curly braces in inside, we write our first item in  the first item consists of a key here on the left,   and then our value here, and it's separated with  the colon. So here we have the key, then column,   and then the volume. And then we have here the  second item. So the second key and the second   value. So now let's create a dictionary that  has some basic information about me. So I'm   going to name this dictionary, my underscore data.  And now to create this dictionary, I'm gonna open   curly braces. And the first key is going to be  name. So I write name, and it has a value that   is my name. So I'm going to write Frank. So  I open single quotes, and then write Frank.   And then I'm going to add a new item. So I write  coma. And then the second key is going to be age.   And the second value is going to be my age. So in  this case, I'm going to write my age, which is 26.   So as you can see here, the first is a strength,  the first value is a string, and the second is   integer. So we can mix different datatypes.  So now I press Ctrl, enter to run this   code, and we created this dictionary. So now  I write my underscore data in here you have   the dictionary we created. So here we can get the  keys of this dictionary, we only have to write my   underscore data that keys so this is the keys  method. So we run this, and we get this dict   underscore keys. And the values are name and age,  which are the keys of this dictionary we created.   So name the first key and age the second key.  Now we can get also the values. So my name and my   age. So we just have to use the values method. So  I'm going to paste this one here. And instead of   writing that keys, I'm going to write that values.  And now run this and we get my name and then   my age. So next, I'm going to get the items. So  as I said before, an item is this. So this is the   first item. And this is the second item. So we can  say that the item is a pair of key and volume. So   we can get this by using the items method. So  instead of writing dot values, I'm going to   write here that items and then run this one. So  here we got the first item. So the first pair,   key and value, which is my name am well  that key name and then my name Frank.   And then the second items so the key name,  age and the age which is 26. Now we can add   a new pair of key value in this dictionary we  created. So let's say we want to add my height.   So I write my data in. Let's say we want to add  the key name height. So I write height. So we use   square brackets here. And then we set this to the  value. So let's say it's 1.7. So I write my data,   and then square brackets, then hide inside it,  and then equal to 1.7. So if I run this, in,   then I run the dictionary, we can see that there  is a new item, and it's the height. So height,   column, and then 1.7. This is how you add  a new item to the dictionary, a now we can   update this height. So let's say I'm not  1.7, but I'm 1.8 meters. So what we can do   is to use that update method to update this  value. So I write my underscore data. In here,   I can use the update method. So I write update,  and then inside parentheses, we have to open   curly braces to update this new item. So  I'm gonna write the key, which is height.   And then I'm going to set the new height, which  is 1.8. So let's try this out. I run this, and   then let's see the values. So let's see if it was  updated. So I ran this, and we got the height 1.8.   So it's perfect. So now let's see how we can make  a copy of a dictionary, the same way we did before   for the lists. So to make a copy, we just have to  write the name of the dictionary, in this case,   it's my underscore data. And then just as we did  for the list, we can use that copy method. So   we write that copy with parentheses, and then we  create an a copy. So here you can see the copy.   And now I can assign these to a new dictionary.  So I'm going to write new underscore dict.   And now, I'm going to copy this one, I'm going  to run and then I write new underscore dict.   And run this. And as you can see, it has the  value of the my underscore data dictionary.   And something I didn't tell you when I make a  copy of the list is that if you change the data   inside that my underscore data dictionary, so the  old dictionary, that effect is not going to be   seen in the new dictionary. So for example,  if we write one, that nine, and here,   I update this in the old dictionary, so here  you can see height 1.9. And if we run this new   underscore dict, we can see that after running,  this height, remains with the same value 1.8.   And he doesn't change to 1.9. This doesn't happen  if you make one of these copies most people do.   So let me show you what I'm talking about. So  most people just make a copy doing new data,   underscore to equal to my data. So this is the  old dictionary, and this is my new dictionary.   So what happens if I run this, and then I,  I'm going to show you the values of this new   dictionary. So this is 1.9. And if I update this  to, let's say, one point, 95. So update here,   update here, here is one point 95. And if I  run this new underscore dict underscore two,   we can see that the value was updated to  and this shouldn't happen. So if you want   to create a new dictionary that works  independently from the old dictionary,   you should use that copy method. And this is  the same if you're making a copy of a list.   Finally, let's see how to remove elements from a  dictionary. So just like we did with the lists,   we can remove an item in a dictionary. So  there are different options. First, we have   the pop method. So right, my underscore data, I'm  using the old dictionary we've been using so far.   So my underscore data, and I'm gonna  write that pop. So this is the pop method.   So here I can write that key. So in this case,  I'm going to write the key. Let me see here, my   underscore data the key name, so I write And then  parentheses them name. So as you might remember,   the pop method returns this value of that key.  Before we did with the list, and it returned   the list element, in this case, it returns  the value of the key. So this is the key name,   it returns the value. So if we print this my  underscore data dictionary, we see that this pair,   key value is in here. So we successfully remove  this item. Another way to remove an element or   an item from a dictionary is using the delta  function. So we write del, and then we write   that name of the dictionary. So my underscore  data, and then we have to specify again, that name   of the key. So we open square brackets and open  quotes. In here, let's say we want to delete or   remove the H key with its value. So write H, and  we run this. And then if we print this dictionary,   again, we get the dictionary and we say that the H  key was removed, and also its value. And finally,   you can remove all the items in a dictionary with  a clear method. So we write my underscore data   and use that clear with parenthesis. And now if  we bring this dictionary, you can see that this   is an empty dictionary, because we removed  all the elements from this dictionary.   Now let's see one of the most common statements  use in Python. This is the if statement,   the if statement is a conditional statement used  to decide whether a certain statement or block of   statements will be executed or not. Here, you can  see the syntax of this if statement. And as you   can see, it starts with the if keyword, followed  by that condition. So if the condition is true,   this code here is going to be executed if  the condition is not true. So it's false.   The code here in the lf it's going to be tested.  So here in this LF block, this new condition   will be tested. And if this is true, this code  below will be executed. But if it's not true,   then the else block will be tested. And here,  this is the last block, and automatically this   code will be executed. So here one little detail  that most beginners forget to write is the column.   So it's sometimes easy to forget, it's there,  but you have to include it in one order things   some people miss is this indentation. So here,  there is an indentation, you have to include after   the column. So every time you write here column,  you press enter in you automatically. In most test   editors, you're gonna get this indentation. But  if for some reason you don't get that indentation,   and you get something like this, you can indent  this line by using the tab key in your keyboard.   So just press tab, and it's going to indent  this line. So make sure you're right that   column and do include an indentation for  each code that will be executed. So here,   here and here. So now let's have a look  at some examples to see much better how   that if statement works. So first, I'm going to  create a new variable. And as you might remember,   to create an variable, you have to write a name of  this variable. In this case, I'm going to name it   age, and then you have to set it a value. So in  this case, this is going to be a number. So I'm   going to set this age to the number 18. And now  I'm gonna write this if condition or if statement,   so I write f, h is greater than or equal to 18,  then column and then this code is going to be   executed. So if this is true, I'm going  to write, print and then a message. So   if this person or if the age is equal or greater  than 18, I'm gonna write the message. You're   and adult ng as you can see here, I'm using single  quotes, and I wrote down pastor feet. So I'm   going to use double quotes, and everything is fine  now. So here print, then the message and you're an   adult. So if this isn't true, I write else in then  column and print. Here a new message, which is,   you are a kid. So let's see this again.  So if the age is equal or greater than 18,   then we print, you're an adult. But if it's  less than 18, we print you're a kid. So   here, we run this code English, we'll get this,  because 18 is equal to 18. So let's run ng as you   can see, we get the message, you are an adult.  So now we can play with this, we can change the   age value. So here, I'm going to set it to 15. So  I ran in as you can see here, 15 is less than 18.   So this is false. And this code is executed. So  this block here is going to be executed. So we got   you are a cape. So we can try this one more time.  So in this case, I'm going to write another age.   So 30. And again, 30 is greater than 18. So this  is executed, so you're an adult. So now let's add   a new block, and I'm gonna use the LF. So I write  LF, and then h. And then greater than, let's say   13. And then column, press enter, and we got this  indentation. And then we print another message. So   if the H is equal to or greater than 13, we write  the message you are at teenager. So teenager. So   if it's between 13 and 17, or well, less  than 18, it's going to be your a teenager.   But if it's less than 13, it's going to be you're  a kid. So let's try this out. So I ride first 10.   And then we get your kit, because it's less than  13, then we're changing this to 14. And then we   get you're a teenager, because 14 is greater  than 13. In finally we write 20. And we get   you're an adult, because 20 is greater than 18.  And that's it. That's how the if statement works.   Now it's time to see one of the most common loops  in Python, this is the for loop. Python, for loops   are used to loop through an iterable object and  performs the same action for each entry. One   example of an iterable object is a list. So we can  look through each element of a list and perform   the same action on each element of that list. Here  you can see the syntax of the for loop, and as you   can see, here is the for keyword, and then we have  to use a variable, then we have to write that in   keyword. And then that iterable in this case, as  I told you before, the most common is the list.   So you have four variable in list. I'm gonna write  here lists so you can see much better and then we   have to write that column. And then after a  column, it goes and indentation. So here we   have the indentation in the code that will be  executed for each iteration here that we make   with a for loop. So to see this much better, I'm  going to use that countries list we created before   so these are the countries list. And I'm going to  loop through this list. So right for and then we   have to set a variable that is going to be just  just temporarily, so this variable is going to   be called country. So this variable doesn't exist,  we just created temporarily. So for country in and   then we have to write the name of that iterable  which is in this case a list. So countries,   so for country in countries and then column  and then enter in we get this indentation.   Then we say print Country. So for this  variable in this iterable, which is a list,   print each element, this is what we're saying in  this for loop. So we run this, in, as you can see,   each element of the list country is printed.  So we're looping through that countries list   and printing each element. So the first is the  United States, then India, then China and Brazil.   And this is how the for loop works. Now,  let me show you a new function that you can   implement along with a for loop. And it's called  enumerate. So I'm going to write here enumerate.   In here, I'm going to put this country's list  inside this new function. So what this enumerate   function does is to enumerate each element of the  country's list, as we loop through the list. So   I'm going to add here a new variable, and it's  going to be i, then comma and then country. So   this enumerate will return two elements, the first  one is going to be the number of the loop. And the   second one is going to be the element itself. So  here, I have to print apart from the country, the   i variable that I just created here, or  it's just temporarily here. So write print,   I, and then print country. So here, we're going  to print here, that number of iteration in that   element. So I run Ctrl, enter, and here we get it.  So first is the United States. In that iteration,   the first iteration with each, which is zero,  then we get India in the second iteration,   which has one, and so on. So as you  can see, here, the AI starts with   zero. So this is how enumerate works, it starts  with the number zero, and it returns the number   of the loop and the element. And finally,  let's loop through elements in a dictionary. So   let's use the dictionary we created before that  was my underscore data. Well, this is empty.   So I'm gonna use the original dictionary.  So here I have the original dictionary,   and it's here, so I'm just going to print  it. So this is the dictionary in now we're   gonna loop through this dictionary. So let me  show you here. First, we have to write for,   and then we write the key. And value because one  item, as you might remember, is made of a key, and   the value, so key and value. So we say, four key  coma value in, and then the name of a dictionary.   So right, my underscore data. In order  to get the items of this dictionary,   we have to use the items method, so we write that  items, and then parentheses, then we write column,   and we press enter. So here, we can print the  key. And we can also print the value. So key   and value, and then we run this code, and as  you can see here, we get the key, the first key,   and we get the volume, we get name, and we get  Frank, and then the second key H ENDA H 26.   So this is how you loop through elements or  items inside a dictionary. Okay, now let's   see how functions work. In Python. A function is a  block of code, which only runs when it is called,   you can pass data known as parameters into a  function. So here is the syntax of a function.   And as you can see here, we have first to set  the keyword def to create this function. And   then we have to write the name of this function.  And inside parentheses, we define the parameters   of the function that we're creating. Then we  write column and below, you have to write the code   and every function should return something. So we  have to use that return keyword, and then return   something like a variable for example. So now  let's create a basic function. So first we write   def, and then we write the name of the function.  So this function is going to do something really   simple. It's going to sum the values we pass into  it. So it's going to be name, some underscore   values. And as parameters we said a coma B, then  Column M, press enter, then what this function   is going to do is to add the a plus V values,  and we're going to set this equal to x. So we   write x equal to a plus b, as I told you before,  you should return something after we finished our   function. So we write return. And here, we're  going to return the x variable. So write x. And   that's it. That's how you create a function.  I ran this code, as you can see, apparently,   nothing happens, but this function was created.  So to use this function, we have to call it so to   call this function, we have to write the name of  the function. And then we pass some parameters in,   in this case, it's called arguments when you call  the function, so I'm going to write down argument   one and argument three. So once you call this  function is going to go to the function here,   and is going to set this one equal to A in these  three equal to b. So you have one plus three,   and this is four. So x is going to be equal to  four, and then this function is going to return   the value of x, which is four. So this is supposed  to return the value of four, so we run this,   and we get the value of four. So this function is  working properly. Okay, now let's see some built   in functions that Python has. Python has lots  of built in functions that can help us perform   a specific task, let's have a look at some  of them. So let's start with a land function,   we only have to write the word land, and then  we open parenthesis. And as you can see here,   you better not look gives the green color to  functions. Now let's calculate the length of   the country's lease. So I have here the conscious  waste. And now I'm going to copy this one, paste   it inside parentheses. And what the len function  is going to do is to calculate the length of any   iterable object, in this case, a countries list  is an iterable object. And now I'm going to run to   calculate the length of this object. So I run this  one. And as you can see here, dial length is four.   And this is how the land function works. Now let's  see a different function. In this case, I'm going   to create a new list that contains only numbers.  So I'm going to write random numbers here.   1063 81, then one there, 99. So this is my new  list. And I created this list with only numbers to   try the max and min function. So the max function  is this one, we write Max and then parentheses,   and this one returns the item with the highest  value in an iterable. So my iterable is this list,   and we're going to get the highest value of the  elements inside this list. So we'll run this one.   And as you can see, here, the maximum value  is 99. And we can do also the mean function,   and it's going to have the opposite effect. In  this case, we're going to get the minimum value   of this list. So we run and we get one. Okay,  another common function used in Python is the   type function and this function give us the type  of the object, we only have to write type in what   this function does, is to return the type of an  object. So in this case, let's copy and paste that   country's object. And if we run this, we can see  that this object is a list. And that's correct,   because here we created a list with square  brackets. So that's what the target function does.   And finally, the last function we're going  to see is the range function. This one   returns a sequence of numbers that start  with a number and ends with another number.   So let's see how it works here. So this one  has three arguments. First I start number,   this one, I'm going to write one, then the  number where the sequence stops. In this case,   I'm going to write let's say 10. And then  the last argument is the increment. So how   this sequence is going to grow by how much so in  this case, I'm going to say that this sequence is   going to grow by two. So write two. Now I run in  as you can see, Nothing happens, we only get the   same text here, that if we make a loop here,  so I write for I, in wrench, now print this i.   So this is a for loop, we saw this before. And  here we run. And as you can see here, we're   iterating over this range, and we're getting the  elements inside this range. So the first element   is one, the second is incremented by two, so one  plus two is three, then three plus two, five,   then seven, and then nine. And then we should  get 11. But the last element here, it's 10. So   this sequence stops at 10. So we only get until  number nine. And that's how the range function   works in Python. And that's it. Now, you know,  the most common built in functions in Python.   Okay, in this video, we're going to see what  are modules in Python. In Python, modules are   files that contain Python code, a module can  have classes, functions and variables in even   runnable code. And to get access to a module,  we have to use the Import keyword, this one,   and to see a module in action, we're going to see  that oh as module, and this one comes with Python,   so you don't need to install it. So to get access  to these always module, we have to write import   always. And that's it. We only write this in now  let's see some functionalities of this module.   So the first one that we're going to see is the  get current directory method. So to get access   to that method, we right always, then get C, W,  D, and then parentheses. So this C W D stands for   current working directory. So we're going to get  the directory where our Jupyter Notebook file is   located. So this file I'm working with right now.  So let's run in, let's see what happens. So as you   can see, here, I have the path where the Jupyter  Notebook is located. So this is the complete path.   And you can see it by using the get CWD method.  So now let's see another method. And in this case,   we're gonna list all the elements in the  folder where this Jupyter Notebook file   is located. So here, to do that, we're going  to use the method list Dir. So this means list   directory, and I'm going to run it. And as you  can see here, I have this Jupyter Notebook file   that is named untitled. As you can see here,  the name of my file is Untitled. And this order   elements, you can ignore it, they are not files,  they are just some hidden elements in my folder,   but they don't matter. So right now, the only  file I have in this folder is this untitled file.   So this is what the list der does. So it lists  all the elements in the folder where this Jupyter   Notebook file is located. And now let's see the  last method, which helped us create a new folder.   So this method is called make Ders. And we have  to write always that make the errors, and then   parentheses, and inside parentheses, we have to  write the name of the folder we want to create.   So in this case, I'm going to name it New  Folder. Simple as that. And now if we run,   we're going to see that nothing happens. But now  if we use this list dir method to list all the   elements in my folder, we can see that there is  a new folder. So here, if we compare this result   we got before with this new result, we can see  that there is one new element. And this element   is that New Folder element, which is the folder  we created using that make ders method. And that's   it. Those are some basic things you can do with  the OAS module. In the following videos, we're   gonna install different libraries, packages and  modules, so we can do even more things in Python.   In this first introduction to pandas,  we're going to learn what is pandas?   We're going to compare pandas with Excel, and then  we're going to learn what are pandas data frames?   So first, Pan This is probably the best tool  to do real world data analysis in Python.   It allows us to clean data wrangle data, make  visualizations, and more. You can think of pandas   as supercharged Microsoft Excel, because most  of the task you can do in Excel, you can also   do it in pandas and vice versa. That said, there  are many areas where pandas outperforms Excel. So   before you learn pandas, let me show you why you  should learn pandas, especially if you already   know Excel. So there are some benefits that  pandas has over Excel or Python has over Excel.   So before dedicating time to learning pandas and  also Python, let's see what are these benefits.   So first, limitation by size, Excel can handle  around 1 million rows, while Python can handle   millions and millions of rows. Another benefit  that Python and pandas have over Excel is the   complex data transformation. So in Excel memory  intensive computations can crash workbook while in   Python. When you work with pandas, you can handle  complex computations without any major problem.   Also, Python is good for automation. While  Excel was not designed to automate tasks,   you can create a macro or use VBA to  simplify some tasks. But that's the limit.   However, Python can go beyond that with its  hundreds of free libraries available. And finally,   Python has cross platform capabilities. This  means that Python code remains the same regardless   of the operating system or language set on your  computer. Okay, before I start writing code, let   me explain to view the core concepts of pandas.  So we're going to start seeing the concepts of   arrays. So arrays in Python are a data structure  like lists. So you can find like one dimensional   array or two dimensional arrays, also known as  2d array. And the two main data structures in   pandas are series and data frames. So the first  is a one dimensional array. Why the second,   a data frame is a two dimensional array.  In pandas, we mainly work with data frames.   But if you didn't understand so much the  definition of a data frame with arrays. Let me   show you another definition, this one using Excel.  So a panda's data frame is the equivalent of an   Excel spreadsheet, pandas data frames, just like  Excel spreadsheet, have two dimensions or access.   So there are two axes and one is the row and  the other is the column. So the column is also   known as series. So what we seen before this  one dimensional array series is a column this   is another name to call the columns in, in a  panda's data frame. On top of the data frame,   you will see the name of the columns. And on the  left side, there is the index. By default index in   pandas start with zero. That intersection of  a row with column is called a data value, or   simply data. We can store different types of data  such as integers, strings, Boolean, and so on.   Right now, you see on the screen, a data frame  that shows the US states rent by population.   I'm going to show you the code to create a data  frame like this later. But now let's analyze this   data frame. So the column names are also known  as features. So our features here are states   population, and postal. While each row value is  known as observation, we can say that there are   three features and four observations because  there are three columns and four rows.   Keep in mind that a single column should have  the same type of data. In our example, the states   and postal columns only contains strings. While  the population column only contains integers.   We might get errors when trying to insert  different datatypes into a column. So avoid mixing   different type of data. So now let's see that  terminology translation between Excel and pandas.   So as I mentioned before, in Excel, we work with  worksheets. In pandas, we work with data frames.   So the columns in Excel are also known as series  in pandas. But we also mentioned or we also say,   often the word columns. And in pandas we worked  with index. So the index are those numbers that   are on the left. And in pandas, we also say  rows, we have many rows with observations too,   but rows are fine. And finally, in pandas, we  work often with these n a n that stands for not   a number. And this is the equivalent of an empty  cell that you might find in Excel. So that's it   for now. In the next video, we're going to learn  how to create a panda's data frame from scratch.   Welcome back. In this video, we're going to learn  different ways to create a panda's data frame.   So as you might remember, a data frame looks like  this. It has columns and rows, and the columns are   series. So series are 1d array. And arrays is how  we create a data frame. So this is the first way   to create a data frame with arrays. So these  are arrays, we have 1d arrays, 2d arrays, in   1d arrays are basically columns, while 2d arrays  are data frames. So usually, to use arrays,   we use a library name NumPy and NumPy is what  is under the hood of pandas. So to use NumPy,   we have first to import NumPy. We're going to do  that a bit later when we write code. But just to   give you an idea of what a numpy array looks like,  here, I wrote a basic array, we have to use in P   that array to create this data frame that you see  on the right. And well this is one way to do it.   You can also use lists, as I'm showing you right  now. And as you can see here. And the second   option, when you create a data frame with  lists, you don't need to use NumPy arrays,   because you're using some kind of lists arrays.  So we're going to write that code to create   a data frame with arrays. But let's see  the second option to create a data frame.   So the second option is dictionaries, you  can create a data frame with dictionaries.   And as you might remember, a dictionary has  a key and a value. So we can use the key as   column name and the value as the data. So  the value can be a list. So this data will be   many elements inside a list. So a pair of key  and value is known as item in a dictionary,   in this case is going to be a series  because it's one column what we have here.   So this is the second way to create a data frame  with dictionaries. And we're gonna see that   with code a little bit later. But now let's see  the third way, which is with CSV files. So CSV   files are files that can be open in spreadsheets  like Excel. And this is the easiest way to create   a data frame because we only need to read the  CSV file and then the data frame is created.   And that's it. So now let's go to Jupyter notebook  to create a data frame writing some code. Okay,   now we are on Jupyter Notebook. In here, we're  going to write the code to create a data frame.   And we're going to use the three ways I showed  you before. So the first thing we're going to do   is to import the libraries we're going to use to  create a data frame. So that's the first line of   code. And I already wrote that. So it's here. So  first, we import pandas, and then we import NumPy.   So import pandas as PD. PDS is convention to  name pandas and NPWS. away to name NumPy. So   to run this code, just press ctrl enter in our use  weight in we import pandas in NumPy. So let's see   the first way To create a data frame, so the  first is with arrays. And to create an array,   we have to use a numpy. This is the first  option. So we write in p, which is the   short name for NumPy. And then we use the array  methods. So we write array, open parentheses,   and inside we write the array we want to create.  So I'm going to create, I'm going to write   random numbers just for the sake of this  example. So I open double square brackets.   And then let's write, let's say one and four.  And then let's say two and five, and the last   one is going to be three and six. So each pair  of let's call it list. Actually, they are lists   each list or percent row. So this is the first  row or this is going to be the first row, this   is going to be the second row in our data frame.  And this is going to be the third row. So here,   we can name these arrays, and I'm going to name it  as data. So that is equal to this numpy array. So   I'm going to execute this code. And now we have  this data. So we created the array using NumPy.   Now let's create a data frame with pandas. So to  create a data frame with pandas, we have to write   pandas. In this case, I can write PD, because I  name it like this here in my first line of code.   So I write PD. And then to create a  data frame, we use the the data frame   method. So we write that data frame, and  then we open parentheses. And here we have to   feel some arguments. So the first one, and  that's what something that you always have to   include in this data frame method is that data  because you cannot create a data frame without   data. So first, we include the data. So first,  copy here, our array, and then you paste it here.   That's the first argument. So you can create this  data frame as it is, I'm going to show you here,   use CTRL. And enter su as you can see, here,  here's my data frame. But as you can see,   it's full of numbers and column names also have  numbers and the row names also have numbers. So   to make it more understandable, we can  rename this. This column names and row names,   or index, actually, the name of the row names  are index. So first, we can name this index   as rows. For example, we, you only need to add  the index argument have some writing right now.   And then you have to specify the names you  want to set. So you have to open list. So this   first or this second argument has a form of  a list. So the first element is going to be   the first index. So here, zero, so in case  you don't want it to be zero, you can set here   another name. So in my case, I'm going to set  it as row one, then Kuma to set the second index   as row two, and the third as row three.  So now we can add also, or we can modify   also the column names, we have to use that  column argument. And here we write it columns.   And then we open square brackets, because it's at  least here that we're going to add it. And in this   case, we have to modify only two elements. So the  first is going to be I want to name it, call one   and the second call two. So I'm going to write  this one. And actually, I'm gonna name this data   frame. So I'm going to set it to a variable,  and this is going to be equal to the F, the F,   it's the common way to name a data frame. So DFS  stands for data frame. So I'm going to run this   code now. And as you can see here, it ran. Now  to show the data frame I can write here DF, so df   and now we have here, the data frame. And as  you can see here, the first row one for it's   my first my first list, and the second is the  second row and the first column Well, that's a   serious as we've discussed before. So we have also  the column names that we modify and the row names.   So now let's quickly see how to create a data  frame with arrays. But in this case without NumPy.   So I'm going to copy these line of code, and  I'm gonna paste it here, option two. So here,   I'm going to paste this because this is the base  of this arrays with list shape. And I'm gonna just   delete this, I don't want numpy array anymore,  just this double square brackets. So I run this.   Now, to create a data frame is the same way we  did before. So just copy this and paste it here.   So run this, and now I can run the I can write  df, and now execute this code. So as you can see,   we have the same result, I'm just showing you  the second way. So you don't have to worry about   learning right now. NumPy. Okay, now let's create  a data frame from a dictionary. And we're gonna   use lists in this example, and we're going to  create a data frame using more meaningful data. So   in this case, to create a dictionary, I'm going  to use two lists, the first is going to be least   name states in the second, it's going to be the  population, and it will contain the population of   each state. So the first list is states, and I'm  gonna write it here. And I open square brackets,   because this is a list, you know, I write some  states in, in the US. So the first is California.   The second is going to be Texas, let me write it  here. The third is going to be Florida, and the   last one, New York. So I quickly write it here.  And now I'm going to create a population list.   So in this case, going to pay. So in  this case, I'm going to paste this data,   so it pays to the population on each state, you  know, I'm going to create a dictionary from these   two lists. So I'm gonna write the name of the  dictionary. So the name is going to be dict.   Underscore states, then this is a dictionary, so  I should use square brackets, sorry, curly braces.   And now I'm gonna set the name of the key.  So the first key is states, then colon,   and now the element or the value.  So this is states, the first volume.   And the second key and value is population,  I'm just gonna set it to with a capital letter.   And the second is the least population that we  have here. So with this, we create our dictionary.   So I'm gonna run these two. And now we have lists  and the dictionary. So now we can easily create   a data frame using the data frame method that we  used before for the first option when we create a   data frame with an array. So to do it, just write  PD, then that data frame, and now we have to write   inside parenthesis, the name of the dictionary.  So I'm going to copy a dict underscore states. And   I'm going to set this to add a new variable. So  I'm going to name these DF underscore population.   So data frame about population. So now I run this,  and here I get an error because I didn't write   data frame correctly. Here is in capital letter.  So run again, and now everything is okay. So   now to show the data frame, I use paste this  one here, and now Iran. So here we have this   data frame. And as you can see here, my first  key is states is the name of my first column   in the data inside the state's list is here.  So here is my first column or my first series,   and the same goes for population with its data. So  here we created a data frame using a dictionary.   Okay, finally, let's create a data frame from  a CSV file. To create a data frame from a CSV   file, we have to use the read underscore  CSV method. So first, we write as usual PD,   that stands for pandas. And then we use the method  so we write rate underscore CSV, open parenthesis,   and then we have to write name of this CSV file  here, I'm going to paste the name. So it's name,   students performance that CSV and download  this data, you can check the notes of this   video. And actually, we can have a look at this  data before importing into pandas. It's here I   have it in Google Sheets. And as you can see here,  we have this course of some exams, math, reading,   and writing. And we have some other data. So  we can import all of this data, all of our   1000 rows in pandas. So all of this is going  to be here. So here, we only have to define the   name of this data frame. So here, I'm  going to name it DF underscore exams. So   now Iran, and to show now the first five  rows of this data frame, we can use a method   named head that we're gonna see later. But just to  give you an idea of this, we can write that head,   and we get the first five rows. So as you can see,  here, we have the first five rows of this Excel   or actually CSV file. And you can see here, for  example, the first row, it says female group B,   and math score 72. So let's check if that data  is the same here. So we have female group B,   and math scores 72. So we have all this data  here in this data frame. So if we want to see   all of them, all of the rows here, we can forget  about that head. And now we have all the rows.   Well, here, we cannot see part of the rows. I'm  going to show you how to see that part later   in this course. But now, as you can see, if we  run these DF underscore exams, we can see like   the summary of this dataset, or well data frame  this case, by the way, in pandas or when we work,   actually in Python, we usually call these type of  CSV files. We'll call it data sets. And when we   read our data set, using what pandas, the  result is a data frame what we have here,   so the CSV file, it's a dataset, and this  when we read it with pandas is a data frame.   And that's it. These are the three  ways to create a panda's data frame.   Okay, now it's time to see how to display  a data frame in pandas. So here I have the   CSV file we use before to create a data frame.  And a little detail I forgot to mention before   is that this CSV file should be located in the  same directory where your Jupyter Notebook script   is located. So what I mean by that Jupiter not  postscript is what we're seeing right now. I mean,   the, what we're working right now is a Jupyter  Notebook script, this this file that we're   working right now. So what you have to do is to  download this CSV file and place it in the same   folder where your Python or your Jupyter Notebook  script is located in the same folder, and this is   how you're going to read this CSV file using the  read underscore CSV method. So just make sure   both the CSV file in the Jupyter Notebook  script is in the same place in the same folder.   Okay, now, I'm going to run these first two  lines of codes that we've seen before. So   the first input pandas and the second reads this  CSV file, so I run this, and now we have this CSV   file is stored into these DF underscore  exams. This is my data frame. So now,   let's see how we can see this data frame. So the  easiest way to see this data frame is just copy   this name this variable in our pasting here. Now  I execute this, you know, we have the data frame.   Actually, this is a summary of the data frame  because not all the rows are seen here. So here we   scroll down a little bit. We can see here that  there are 1000 rows and eight, eight columns.   So here we can see all these rows and the  columns. But as you can see here in the middle,   we cannot see the the rows, so it's until four in  there. It continues with 995. So usually when we   work with pandas, we don't need to see that data  one by one. So row by row. That's not how we do it   with pandas. But if For some reason, you need to  see all the data in pandas, as you will do it here   in Excel or in Google Sheets. I'm going to  show you a way to do it a bit later. But first,   I'm going to show you different ways how we  usually displayed a data frame in pandas.   So the first way to do it is using the head  method. So here, to use the head method, we only   have to write the name of the data frame, in this  case, DF underscore exams, and then right head,   then parenthesis, then we run this, and this is  how we get the first five rows in a data frame.   So as you can see, here, we have from row zero  to row four, and this is how we got these first   five rows. So this is the head method in the same  way, we can get the last five rows of this data   frame by using the tails method. So here, we only  have to write again, the name of the data frame,   in this case, well, the same DF underscore exams,  and then write that tails, then parentheses,   run this, and actually, I think it's tailed. Yeah,  it's tailed in singular. And now we get this, we   got the last five rows, so it's from 995, to 999.  So these are the five rows, the last five rows.   And now in case you want to get more rows, so  not only the first five, or the last five rows,   you can add an argument to the either the head  or the tails method. So I'm going to use here   the head method as an example. So here, I copied  this, and I'm going to paste it here. So let's say   now we want to get the first 10 rows. So we right  here inside parentheses, 10. And now we run this,   and I scroll down here, and we can see that the  first 10 rows are here. And we can do the same   with tail. So here are right tail. And as we  can see, the last 10 rows are displayed here.   So you can specify the number of rows that  you want to display. And that's how you do it.   So now, I'm going to show you how to display all  the rows of this data frame, as you will do it in   Excel or in Google Sheets. To do so first, we  have to know how many columns this data frame   has. So an easy way to get the number of columns  is using the Shape attribute. To get the shape   attribute. First, we write the name of the data  frame. So in this case, DF underscore exams.   And then to get to this attribute to get access to  this attribute, we use the DAT and then the name   of the attribute in this case shape. So now we run  this, and we get 1008. The first is the number of   rows, and the second is the number of columns. So  we have 1000 rows. So now to display all the rows,   we have to use that set underscore option method.  So we'll write PD dot set underscore option.   And inside parenthesis, our first argument is  going to be the following. In this play that Max   underscore rows. So here, we have to specify  one more argument. And this is going to be   the number of rows we want it to to have.  So here it's 1000 because we have 1000 rows,   and we run this. And as you can see here,  nothing happened because we only modified   the default behavior of pandas. So if we want  to get the data frame, we just press enter   and execute this data frame. I'm going to scroll  down in here as you can see here, there are all   the rows of this data frame. So I'm going to  scroll all the way down here. And as you can see,   it says 999 So all dot rows are here displayed.  In that's it for this video. In the next video,   I'm going to show you the different attributes,  methods and functions a data frame has in pandas.   Welcome back. In this video, we're going to see  some basic attributes, methods and functions that   we can use in pandas. But first, let's learn what  are each of them. So first, attributes are values   associated with an object and they are  referenced by name using that expression.   So to get to an attribute, we have to use the  DAT sign. So for example below you can see that   we have a data frame named df and to get  columns, we have to use that that columns. So   columns, it's an attribute. And that's how  we get this attribute of this data frame. So   now we have a function. A function is a group of  related statements that performs a specific task.   So we've seen functions before. In Python, we've  seen some Python built in functions like the max   that gets the maximum value of a list, or main  that gets the minimum value or length that gets   the length of the list. So those are some Python  built in functions that we can use in pandas   to. And finally, methods are functions which are  defined inside a class body. So we haven't talked   anything about classes, because it's not the main  topic in this course. So just keep in mind that   functions are inside a class. So when the creators  of pandas built pandas, they use many classes.   And those functions inside some classes are  known as methods. So for example, below,   you can see the head method. And we've seen also  the tail method and some other methods. So far,   as a rule of thumb, when we use methods, we have  to write the parentheses. But when we want to get   access to attributes, we only write that that  and the name of the attribute. So the methods,   it's with that in parentheses, and the attribute  is with only that in the name of the attribute.   So enough talk now let's write some code in  Jupyter Notebooks. So here, we're going to use   the same CSV file we use in the previous video.  And we import pandas, as we did before, then we   read this CSV file with a read underscore CSV  method. And now we show that data frame simply by   writing the name of the data frame, so we've seen  this before, I'm just reminding you, now we'll   see some basic attributes, methods and functions  that we can use in pandas. So first, let's check   some attributes of this data frame. So first,  I'm going to copy the name of the data frame.   And now let's check. So the first attribute, it's  going to be the shape. So we've seen this before,   I believe. And to get to the  attribute, we write the dot,   and then we write the name of the attribute. So  its shape. So DF exams, that shape and we get the   name of the attributes. The first is the number of  rows, and the second is the number of columns. So   that's good, the next attribute, the next  attribute is going to be that index attribute. And   as you might expect, we have to write only that  name of the data frame, then that and no index.   And this is how we get the index of this  data frame. So as you can see, this has   some form of range, arranged, as you might  know, has three arguments. And actually two   are necessary. The first is the start,  in this case, it starts in zero.   And the second is this top, so the last element  is tops at 1000. So this is true, because here,   my data frame starts with zero and, and finishes  with 999. Well, it's 1000, because tops one before   1000. And here it increases by one, so 012 and  three, and so on. So a step is one. So this is my,   my index attribute. So now let's continue. And  now let's get access to the column attribute.   So to do so we write the name of the data frame.  And then we write the name of the attributes. So   in this case, column, it has to be written with  S, so in plural, so we run this and we get the   name of the columns. So as you can see here, we  have eight columns, the gender, race, ethnicity,   and so on. And we can use this attribute  even to modify the name of the columns,   but we'll see that later. And now let's see how  we can obtain the data types of each column.   To do so we have to use the D types attribute. So  we write well the name of the data frame again,   and then D types. And this is going to give us the  type of each column. So the gender is object and   actually from the gender to the test preparation  course our objects while the math scores   reading score and writing score are integers. So  numbers. By default, anything that says object is   some kind of string. So I'm going to bring this so  you can see much better. So here is the data frame   again. And as we've seen before, from gender  to test preparation has that type object in,   as we can see here, all of them are strings. So we  can say that objects are the same as strings here.   And also anything that has a score  here represent some kind of number.   So that's why we get here integers. So in 64, so  these are the most common attributes in a panda's   data frame. Now, let's review some methods.  So first, let's see the first five columns.   And as you might know, it's with a hat method. So  we only write the name of that attribute, sorry,   the name of that data frame. And then we write the  head method, so head and parentheses. So we run   this and we obtain the first five rows. So we can  also obtain some summary, input the data frame by   using the info method. So here we write the name  of data frame info, parentheses, and execute this.   So here, we have some information about this  data frame. And here, we have, again, the data   type here, and also how many rows are non null. So  as you can see here, all the data that we have in   this data frame are non null. So there isn't any  empty data here in this data frame. Okay. Now,   if we want to get some basic statistics of a data  frame, we have to use that describe method. So we   write the name of the data frame in right describe  parentheses, execute this, so we run this code,   and we have some basic statistics. So first,  the count. So this indicates how many rows   each column has. So each of them have 1000 rows,  then we have the mean. So it's basically they   assume each of the data here, that numeric data  and then divided by 1000, because there are 1000   rows, then the standard deviation, the minimum  value, for example, in math score, the minimum   value was zero, then 25% represents  the percentiles. So this is q1 25%,   q2 is 50%. In q3 is 75%. Then we have the maximum  value on each score on each exam. And we see that   the maximum score is one candidate, and each of  them to the describe method is a useful method   whenever we want to get some basic statistics of  the data frame, especially of the numerical data   that we have in our data frame. Okay, now let's  see some functions that we can use. In pandas,   we can use some built in functions that Python  has in pandas, for example, if we want to get   the length of a data frame, we only have to write  land, and then inside parenthesis the name of the   data frame. So we run this, and we obtain that  the length of this data frame is 1000. Actually,   the length of a data frame indicates only the  number of rows. So here I made a mistake is rows.   And this is how we obtained the number of rows  of data frame. So also, we can use other built in   functions that Python has like the max function,  so write Max them the name of the data frame,   we run. In this case, we didn't get anything,  anything meaningful because we get like a string.   But if we write here, the index and we write  Max, as you might remember, if we use this,   this attribute, we're going to get the list  of index. So if we use the max function,   we're going to get the maximum or the  highest index here, so run and is 999.   So we can also get the lowest index of a data  frame. We only have to copy this and instead   of writing the max function, we write min. So in  this case, we get the minimum index n is zero. So   now we can obtain the data type of the data  frame. Well, the data frame has data frame type,   but we can verify that using the type  function. So we write type, then, sorry,   write only the name of the data frame. And we run.  So here you can see, the type of this object is a   data frame. And finally, we can use common  function that is the round function. So we   write only round. And this has two arguments.  So first, the object that we want to run,   and in this case is our data frame. And the  second argument is the number of decimal   points that we want to have. So in this case,  I want two decimal points. So we'll run this.   And we're not going to get this number of decimal  points in this particular example, because the,   the numerical data we have here, it's integers.  So they are not floats. So this doesn't have any   effect. But if you have a data frame with float  numbers, you can round those numbers using the   round function. And that's it. These are the most  basic attributes, methods and functions that we   will see often in pandas. Alright, now it's time  to learn how to select a column from a data frame.   So here I have the same CSV file we've  been using in the previous videos. And   well, let's import pandas, and let's read this CSV  file. So I have this in the same data frame. And   I'm just showing the first five rows. So now to  select one of the columns of this data frame, we   have two options. So let's see the first option.  The first option is using the square brackets.   This is the preferred way to select a column in  pandas. And let's see how to select that gender   column. So the first one here, so the first thing  we have to do is to write the name of the data   frame, in this case, DF underscore exams, and then  open square brackets. So I open square brackets.   And now we have to write the name of the column.  So we open quotes in here, I'm going to copy   the name of this column, and I'm going to paste it  here. So we have here, the name of the data frame,   and then the name of the column we  want to select. So now we press Ctrl,   enter to run this code. And as we can see,  we have the first column of this data frame.   So here we have this in, as you might expect,  this is an array. So this is a 1d array. And as   we discussed before, in previous videos,  1d arrays are series, so we can verify   if this is true, so we can do this with that  type function. So I'm going to copy this   column this selection. And now what we're going to  do is to use the type function, so we write type,   then open parentheses, and then inside  parentheses, we write the object we want to   evaluate. So in this case, is this. And now we run  this. And as you can see, here, we get a series   and series, just like pandas, data  frames have attributes and methods,   so we can access those attributes and methods.  And actually, the attributes and methods between   a series in a data frames are very similar.  So for example, if we want to get the index   attribute of this series, we only  have to write that name of the series,   and then write that and the name of the attribute.  So index, so we'll run this, and we get this   index in form of a range that starts with  zero and ends with 1000. So another method   that's sure pandas in series is the head  method. So we can also get the first five rows   by writing that head, and parenthesis. So as  you can see here, we get the first five rows   of this series. Alright, that's it for the first  syntax. This is my favorite syntax. And actually,   most people use it because it's the most practical  in our time to see the second syntax to select   a column from a data frame. So this syntax  involves writing that that sign, which is here.   So let's say we want to get the same gender  column, so we write the name of the data frame,   followed by that and the name of the column  so gender, in this case, we don't need to open   quotes. And we don't need the square brackets. So  we run this code, and we get the same series. So   it's here. And probably now you might be thinking  that this is more practical than the first syntax.   But this syntax has some pitfalls. So now, let  me show you here. So what if you want to get   one column that has two words, for example,  what if you want to get, let me show you here.   This column that has as name math is core. So now  let's try to get access to this column. I'm going   to copy this column name. And now scroll down.  And now let's try. So I'm going to write first   the name of the data frame. And now the.so. To get  access to this, or to select this column, we have   to write the column name. So this is the column  name. But as you can see, if I run this, we get an   error. Because Python doesn't work like that. In  Python, when we have two words, we usually add as   underscore. So that's how Python understands this,  that this is a variable. But if it's like this,   Python will not understand what you're trying  to do. However, if you use the first syntax,   so the square brackets doing have this problem. So  let me show you here. Now I'm going to write this.   I'm going to copy it now I'm going to  paste it here. And instead of having   this only dot notation, I'm going to open  the square bracket. So open square brackets,   and then add the quotes. So as you can see here,  the column names has a string type in Python know   that this is a string in now, if you delete this  dot sign in, you execute this, you get this column   without any error. So these one of the bandages  that the square brackets has over the that sign,   and that's it. In this video, we'll learn how  to select one column from our data frame. And   in the next one, we're going to learn how to  select two or more columns from a data frame.   Okay, in this video, we're going to learn  how to select two or more columns from a   data frame. So as usual, we're going to start  by importing pandas and reading the CSV file   we've been using so far. So we execute these two  lines of code, and we get here that data frame.   So what we're going to do in this video is to  select two random columns from this data frame.   So first, let's pick some columns. So I'm one  to select the gender column and also the math   score column. So to select these two columns, we  have to use that square brackets again. So here,   in this case, we have to use two square  brackets to select two or more columns.   So to do this, we have to write first the name  of the data frame. So it's DF underscore exams.   And now we open square brackets, so we write  one and two twice. So we have two pairs of   square brackets. In inside, we have to write  the name of the columns we want to select. So   we said that we wanted the gender column, so we  write gender. And the second column that we chose   was that math score. So I open these quotes,  and now I write math score. So here, I have this   two columns. And by the way, the order that  we write these columns is the same order that   we're going to get that data frame, I mean, we  can define the order of the columns inside this   square bracket. So here, we're saying that  first is the gender column. And second,   it should be the math score column. So now, let's  run this. And as you can see, here, we obtained   first the gender column in second math score  column. So here, we can see that it's data frame,   and there are 999 rows. So now, we can verify that  this is actually a data frame by using that type   function. So let's check if this selection is  a data frame. So now I'm going to copy this   in here. Let's check out the data type of this  selection. So here I paste it in. Now we use the   type function, we open this parenthesis, and now  we execute this code. And as you can see here,   we get that this is a data frame. So here one  little detail I want to tell you, is that when   we use these two square brackets, or two pairs  of square brackets, we're always going to get   a Data Frame. But when we use only single pair of  square brackets, as we did in the previous video,   we get a series. So one pair of square brackets  is for a series and two pairs of square brackets,   it's for a data frame. Okay, now to  continue with the video, I'm gonna   select two or more columns using these two pairs  of square brackets. So now let's choose the   columns that we're gonna get. So in this  case, I'm going to get that gender column   and all the scores that we have here. So the  math score, reading score and writing score.   So to do so first, I'm going to copy this first  selection with it, to have it as a reference.   And now I'm going to paste it here. So here,  so far, we have two columns. So let's add the   two remaining columns. So here, an easy way to,  to write these columns, it's just by copying this   in the data frame in here, we can paste it. So  instead of writing those names, we can just paste   it here. Now I delete and I put it inside quotes.  So here inside quotes, and here we have it.   So here, as I said, before, we can change the  order of the columns, we use have to, for example,   here, I cat, this, and let's say we want to  have the writing score in the beginning. So   here, I paste writing score. And now what  we're gonna get is first the gender column,   then raw writing score column, and then the math  score and reading score columns. So now, let's   run this code. And as you can see, here, we have  this data frame in the order that we defined here.   Okay, now, you might be thinking, if there is  a way to select two or more columns using that,   that sign, so let's check if that's possible.  Here. For example, let's say we want to get   the gender in the math score column using the dot  notation. So here, I have it. And as you can see,   here, this doesn't look right, because it  you have two strings separated by a comma,   but you don't have a list, you have square  brackets, this is probably gonna fail. So   let's check, I'm going to run this code.  And as you can see, here, we get an invalid   syntax. So it's a syntax error. So as you can  see, we cannot select two or more columns with   that sign. And this is one of the disadvantages  that that sign has over the square brackets.   This is why most people prefer to use the square  brackets instead of the dot notation. And that's   it for this video. In this video, we learn how  to select two or more columns from a data frame.   Okay, in this video, we'll see different ways  to add a new column to a data frame. So here's   the same students performance data frame. And as  you can see, we have three columns with scores,   math score, reading score, and writing score. So  let's say we want to add a new score. So in this   case, let's add our language score. So to add a  new column in spreadsheet, like Google Sheets,   or Microsoft, Excel, will simply insert a  new column. And that's it. But in pandas,   we have to use different methods, or different  ways to allow us to insert a new column.   So let's see how to do it here. So first,  let's add a new column with a scalar value.   So a scalar value is simply a single value. And  in this case, it's the column is going to have one   single value, so all the rows is going to have the  same value. So to do so, we're going to have to   select this imaginary column because this column  doesn't exist so far. So what we're going to do   is to select a column, as we will do with any  other column. So first, we write the name of   the data frame, in this case, DF underscore exams.  And then we open square brackets and open quotes,   as we will do in any column. So here, instead of  for example, writing math score, I'm going to copy   this. Instead of selecting math score, we have to  write the name of the column we want to create.   So in this case, let's write language score. So  this is a new column, we want to create a now   we have to assign to this new column, we have  to give it a new value or a new scalar value.   In this case, I'm going to add a value of 70.  So now if we run this code, we're going to see   that nothing happens that if we now show the  data frame, we're going to see that we have   a new column and this column is name, language  score. And value that this column has is   the same value. So it's 70, in all its rows, so  we have 70 in row zero, and if we scroll down,   we're gonna see that it's 70 in older rows, so  even in row 999, but it's a bit weird that in an   exam, you will have all the students with the same  score. So what you will usually do is to add some   different values to this column. So to do  this, we have to use arrays into great arrays,   we have to use NumPy. So here in the second way to  add a new column, we're going to use arrays. So in   this case, we have first to see how many rows this  data frame has. So in this case, it has 1000 rows.   And this is important because the number of  rows has to match with the number of the rate   we're going to create. So let's create this array.  And first let's import NumPy. So we write import   NumPy as NP. So we run this code. And now  we import NumPy. So now we have to create   an array of 1000 elements. And to do so we're  going to use a method called arrange. So it's   written like this, our range. And this gives  us an range of numbers that start with the   first argument, and that I'm going to write  zero. And the last argument that in this case,   it's going to be 1000. So these are the  limits of my range. So I execute this.   And as you can see, here, it starts with zero  and till 1000. So to verify the length of this   range, we have to use the length function. So  as you can see, here, the length is 1000. So   the rate has 1000 elements. So now I'm going to  assign this to a new variable. And I'm going to   name this variable language score. So language  underscore score. So we execute this in here,   I was planning to see the length of this array. So  I quickly do it here, as we did before, so land,   you know, we count the length of the array. So  now we have to add a new column to a data frame   with this array. And to do that, we have only  to use the same way we did before. So first, we   write the name of the data frame. And then we make  the selection. So this selection is going to be   with the new column Well, in this case is not  new, because we already created it. But let's   imagine it's a new column. So it's language score.  And now we have to set the array to this column.   So we write language score here, and we set it  to this new column. So now to see the results,   we only show this data frame. And as we can see,  here, we have a new column. And this new column   starts with zero, and it ends with 999.  So it doesn't have a single value anymore,   but now has a range of values, you know, there  is a little detail we have to take care of.   So it's course are supposed to be between zero  to 100. And we have here from zero to 199. And   also here we have a sequence of numbers, so it's  from zero, and then one and it increases by one.   And usually in scores, you will see that students  have random scores. So we have to create here,   an array with random numbers. And to do that, we  have to use NumPy again, but here we have to use   a different method. In this case, the method is  named random dot Rand i n t. So let's write it   here. np dot random that ran. And then i NT. So  the first argument is the lowest value of these   random numbers. And by the way, these are random  integer numbers, because it's course are usually   integer numbers. And in this case, I'm going  to say this, this to one. And the second score   is the highest number or value in these random  numbers. And I'm going to set it to 100.   And the third argument is the size. In this  case, we want an array of 1000 elements,   so we set the size to 1000. Now we execute  this, we run this and I'm not going to see this   rate again. I'm just going to check that it has  the land we want to By using the length function,   in here we have 1000 elements. So now let's create  a new variable and store this in a variable. So   here, this is going to be i n t, and then language  underscore score. And this is going to be our   new variable. So here one little detail you should  know is that the first argument is inclusive. And   the last one is exclusive. So this means that if  we here, let's say, we get the minimum value of   this new array, we're going to get that minimum  value is one, because this first argument is   inclusive, which means that it can be included  in this new array. However, if we print now,   the maximum value of this array, we're going to  get that one candidate is not there, because it's   exclusive, which means that this second argument  shouldn't be included in this array. Okay,   finally, let's insert these random integer  numbers in the new column that we created. So   we have to just use the same way we did before.  So here, I copy, and now I paste it. So here,   instead of assigning this language  underscore score, I'm going to use this IMT   language underscore score. So here, I'm going  to run this code. And as you can see, here,   we have this, the same column. And  we have now this data that is random,   random integer numbers from the rows zero to  the row 999. So now, these new data looks more   like a scores like real scores, because these are  random numbers. And these are between zero and 99.   And that's it. Now, one more little detail  I want to share with you is how to create   random float numbers, because before we created  a random integer number, but if for some reason   do want to create random float numbers,  there is a way how to do it with NumPy.   So we only write in ping, then that random,  then that uniform, and arguments are the same.   So the minimum value and then the maximum value,  then the size, which is 1000. Then you run this,   and well, it's similar to the one we got before.  But now we have float numbers. And that's it. In   this video, we'll learn different ways  to add a new column to a data frame.   Alright, now it's time to see some operations we  can perform on data frames. So here we have the   same data frame DF underscore exams. And here we  can apply some common operations to the numerical   columns like math score, reading score, and  writing score. So let's see how to do this   in pandas. So first, we're going to see how to  make operations in columns. So our first task is   to calculate the total sum of a column. So let's  pick first our math score. And let's calculate the   sum of this column. So to do that, we have first  to select a column. And as you might remember,   to select a column first, we have to write the  name of the data frame, in this case, the F   underscore exams, then we open square brackets  and then write either single or double quotes,   then we have to write the name of the column.  In this case, it's this one match score.   This is the column we want to select. And now  instead of selecting, we're going to perform   operations. So in this case, I want to calculate  the total sound of this column, and we have to   use the sum method. So we write that sum in  parenthesis. And this is how you calculate the   total sum of this column. So to verify this,  we run this code, and here we get 66,000.   And this is the total sum of this math column.  Great. Now we can make some other alterations do   will do in Excel, for example, we can calculate  the number of rows using the account method.   So here, we can easily do that. I'm just going  to copy this one. And now instead of writing the   sum method, we write count. So here count and  now let's see. So we see 1000 rows. And yeah,   this is correct because these data Has 1000  rows. So now we can calculate the mean of this   math score column, we have to copy this one, now  paste it. And instead of writing count, we have   to write mean. And here we got the average value  of this math score column. So to get the average,   we have to sum all the rows in this math score  column, and then divided by the total number of   rows, in this case 1000. And this is how you  get this mean value, then we can get other   other operations using the method. So here, for  example, we can get the standard deviation by   writing STD. So we execute this, and the standard  deviation of this math score column is 15 m,   we can get also the maximum and minimum volume.  Let's do it quickly here. So first, the max,   and then the main value, you can actually do it  with Python built in function. But we can also   do it with methods. So here I ran in as you can  see, here, the minimum volume of the math score   is zero, and the maximum is 100. Okay, now I'm  going to show you a quickly way to make the same   calculations. Using that is quite method. I think  we saw that it's quite method in previous videos,   but in case you don't remember it, I'm going to  write here, the name of actually, we only need   the name of the data frame with, we don't need the  name of a specific column, we only need the name   of the data frame. And now we can use the describe  method. So write that describe with parenthesis.   Now we got like a summary table with some  important statistical values. And here we have   the account that mean the standard deviation,  the minimum and maximum value. And as you can   see here, we get all of this with one method.  Okay, so far, so good. Now, instead of making   operations in columns, we're going to learn how to  make operations in rows. So now let's calculate,   let's say, the sum of the math score, reading  score and writing score. To do so we have to   make some selections. And in this case, we have to  make some independent selections. So to show you,   I'm going to copy the name of these three  columns. I copied it. And now I paste it here.   Now we have our math score, reading  score and writing score. So now let me   delete that sign. And now we have to make some  independent selections. So first, we write the   name of that data frame. So DF exams. Now to make  the selection, we open square brackets in quotes.   So now, let me do this quickly in the orders.  Now here, so I open a square brackets. And now   let me do it here too. And now it's ready.  So here we made some independent selections,   in order to make to calculate the sum in a  row, we have to use the plus sign. So here,   the plus operator, we have to write it here  and here. So basically, here, we're making   some in each row. So to verify this, we run this  code. And as you can see, here, we got the sum of   the scores column. So here, let's verify fast  the sum of the first row. And it's 72 plus 72,   plus 74. So 72 with 72 is 144. And with 74 is  218. So here we have it. It's correct. So now,   let's do something else. So now instead of just  summing these three rows, or actually these three   columns, what we're going to do is to calculate  the average to get like an average score. So here,   let me copy this in here, we're going to calculate  the average by summing this and then dividing this   by three. So this is how we calculate the score.  In our let's assign this result to a new column.   To do so we only write equal in them. As  you might remember from previous lessons,   we have to add a new column by writing  the name of this column. So we do that   writing the name of the data frame, and then  making like a selection so we open square   brackets, then open quotes in here we write  the name of the column that we want to create.   So this is same as we did in previous lessons  where we added a new column. So in this case,   I'm going to name this new column as  average. And I'm going to execute this   in our to verify that this new column was created.  I'm going to show this data frame here. Below,   in here is our data frame. So now, in the last  column, you can see that there is an column named   average, it has the average value of this math  score reading score and writing score a Now here,   we can control the number of decimals, we can  just use the round function and write the number   of decimals we want to get. So in this case, I  want only two decimals. So I run this. And as you   can see, here, our data frame looks much better,  because we only have two decimals. And that's it.   In this video, we'll learn different ways to make  operations in columns and rows on data frames.   Alright, now let's have a look at the value counts  method. So so far, we have seen how to count the   number of rows in a data frame. So for example, if  we want to count the number of roads in the gender   column, we either use the length function, so we  write land, then the number or the name of the   data frame. And we only have to write the  name of the column. So as you might remember,   this gives us the number of rows. And we can also  use that count method. So here we write count. And   we get the number of rows that what if we want to  count the gender elements by category, so female,   or male? What if we want to know how many female  in how many male elements are in this gender   column. So this is when the value counts comes  in handy. So we can use this method to count each   category of the column. So to use this method,  we only have to write the name of the data frame,   followed by the column that we want to count. So  in this case, is that gender column. And then we   have to use the value underscore counts method,  as you can see here. So now we execute this.   And as you can see, here, we have not only that  total rows in this gender column, but now it's   divided by category. So we have that there is 518  females and 482 males. So this is how the data   is spread in the gender column to now we can do  more with the value counts method. So we can get   the percentage that each category represents in  the whole column. So here, I'm going to copy this   in now to calculate the percentages, also known  as relative frequency, we have to add an argument   name normalize. So we write normalize,  equal to true. And then we execute this,   as we can see here, female represent 51% of  the total observations in the gender column,   while male only represents 48% of the total  observations. So as you can see here, the   value count method is useful when you want  to have a look at the data by category. Okay,   now let's see another example. And in this case,  let's pick a different column. So here, I'm going   to choose this parent table level of education  column. I copy this. And now let's calculate,   let's count the elements by category. So here,  I'm going to write the name of the data frame   the exams. You know, I open square  brackets quotes in here, I paste this   column. Now to count the elements by category in  this column, we use the value underscore count   method. So we run this code. In here you can see  how the data is divided in this column. So most   people have some college level of education, while  just a few people have a master degree. And now   if we want to get the percentages that represent  each category, we again use the Normalize   arguments. So we write normalize equal to true  and now we're going to get that percentages.   So we can see the percentages If we want to  round these to two decimals, we use the round   method. So we write that round parentheses  in our two decimals. And as you can see here,   we round it to two decimals. And that's it. Now  you know how to use that value counts method.   Okay, in this video, we're going  to see how to source a data frame   using the sword underscore values method. First,  let's import and read the CSV file that we've   been working with in this tutorial. And now  let's store the data frame. So here we have the   data frame, as you might remember, it's, it has  these three numerical columns. And now I'm going   to swirl it using one of these columns. So let's  use the sort underscore values method. And first,   I'm going to write the name of the data frame,  which is dF underscore exams, and then right   sword underscore values. Now I open parentheses.  And now I can use this help here. And as you can   see, the only mandatory argument is by so we can  use this one by and this one, we have to specify   the name of the column we want to sort by.  So in this case, I want to sort by that   math score. So I'm choosing this numerical column  to start with. So I'm going to write math score,   actually, I'm going to copy this one, and paste  it here. So my math score. And sorting this data   frame is as simple as that. Now, we can run this  code, as you can see, here, the data frame was   sort ascending by default. So it starts with  zero, and it ends with 100 in the match score.   So this is how the source and the score values  behave by default. And here one little detail,   you don't need to specify the byte word, we can  omit it. And we run this in as you can see, here,   it still works. So here we can modify that default  behavior of the source anger score values method,   we only have to add a new argument in is that  ascending argument. So let me show you here. I'm   going to copy this one first, and show you here.  So in this case, we're going to sort these sending   by the same column, so we only write, comma,  and then we specify the sending arguments we   write a sending equal to, and here, I want to  show you something in this little help here,   your the sending is set to true by default,  this means that is ascending by default,   but we can change this default behavior by setting  ascending equal to false. And that's what we're   going to do here ascending equal to false, so it  means descending. And now I'm gonna run this one,   and as you can see here is sort descending by the  math score column. So here, it starts with 100.   And it ends with zero. But that's not all, we can  do much more with a sort underscore value method.   So first, I'm going to show you here how to  sort by two different columns. So here, let's   copy and paste these one. So in this case, we're  going to sort descending by multiple columns.   So instead of writing only math score, we're going  to add here, one more column is going to be that   reading score column. So here, I copy this one.  I'm gonna copy and paste it here. But first,   we have to add the square brackets,  because as you might remember, when we   write two or more columns, we need the square  brackets. Now I write comma, and I paste   this written score. Now I add quotes. And that's  it. That's everything you have to do to sort   by multiple columns. Now I'm going to run this  one. As you can see here, it was sort, descending,   first by the math score column, and then by  the written score column. So the priorities   are set here in the list that we include here.  So first is the math score column first priority,   and the second priority is the reading score  column. And that's what you can see here.   Now I'm going to show you a little detail here.  Let me copy the DF underscore exams. And if I   print this one, you can see that the changes we  made weren't updated. So this here, the math score   column has the original values. This happens  because the sword underscore values method,   like many other pandas method on Create a copy of  the data frame. So here we obtained a copy. This   one is a copy, but it doesn't update the values  of the data frame unless we add a new argument,   which is the in place argument. So I'm going to  show you here. But first, I'm going to delete this   tf underscore examples. And now I'm going  to copy this one, and show you how to update   the values of this data frame. So here, I'm going  to copy, those are the same values. But now I'm   going to add a new argument, which is the in  place argument. So here we're right in place   equal to, and now I'm going to show  you the default value. So here,   the default value of employees is false.  This means don't update the data frame,   but only create a copy that if we set it to  true, it means update this data frame. So here,   I'm going to set it to true to update the data  frame. So here writes true. And now I run this,   in apparently nothing happens. But if now we  print that DF underscore x times data frame,   we're going to see that we have that data frame  sorted. In case you don't want to add that in   place argument. And you want to update the values  of that data frame, you have another option that   we used before, which is overwriting the values  of this data frame. So for example, you can only   delete that input argument and write df underscore  exams equal to this. So this is overwriting the   values. But in this case, we're not going to do  that, we're going to add that in place argument,   as you can see here, finally, we're gonna see how  to sort but now not with numerical data met with   text. So as you can see, here, we before a sort  this data frame, by the math score column, in this   one has this numerical data. But in this case,  we're going to solve it by their race ethnicity,   which has this text, so we're gonna sort  this one. So first, we were supposed to get   group one, and then Group B, C, D, and so on.  So let's do this here. I'm going to scroll down.   And first, we have to write the name of the  data frame, followed by the swore underscore   values method. And now specify the name of the  column. So here, I'm going to copy race ethnicity,   here, let me copy here, and it's done. Now I  have the name of the column, I'm going to set to   ascending to true, I know that new argument  we have to add to sort, this is that key.   So add key, then equal to in this case, we're  gonna use that lambda function. I'm not sure if   you're familiar with the lambda function. But it  works similar to an average function we've seen   before in the Python Crash Course. But in this  case, is going to behave a little bit different.   So let me show you here. First, you have to use  the lambda keyword, so we write only lambda. And   now we should write the object that is supposed to  return. In this case, I'm going to write the call   that stands for column. And then we have to write  a column and specify the operation we have to   make over this variable. So in this  case, I want to write a column or call   and then access that a string attributes. So I  write that str, and then use that lower method.   So what we're saying here is get the string values  of the column and then transform it to lowercase.   So here, we get that textual data in  lowercase. And with these three arguments,   we're saying, sort the values inside the race,  ethnicity column, and sorted, ascending, and then   sort the textual data of this column in lowercase.  So here, we have this a, b, c, d, e, in uppercase,   but we're going to get it in lowercase and sorted  by this text data. So now let's run this one.   And let's see the results. So as you can see  here, we have this race, ethnicity column,   and it's order ascending. So here, we got the  A and B and C and D, and so on. And that's it.   These are the different ways to store a data  frame using the sword underscore values method.   Welcome back. In this video, we're going to  see different ways to make pivot tables. If   you're an Excel user, probably you  make many people tables in the past.   In pandas, we can also make pivot tables. And  in this case, we use two different methods,   the pivot method and pivot underscore table  method. In this video, we're going to see   the difference between the two of them. So  first, let's see what's the pivot method.   So the pivot method, reshapes data based on  columns values in it doesn't support data   aggregation. So this means that this is not the  regular pivot table you'll see in Excel. Because   you can only reshape data with a pivot method, and  you cannot do anything else. To explain you better   what the pivot method does, I'm going to show you  an example. So here we have a little data frame.   And this one has six rows and four columns. As  you can see here, there are many duplicate values.   For example, in a column foo, that one value is  repeated, at least twice. And the same goes for   the two value also in the column bar, you  can see that the a, b and c is duplicated.   So when we have this type of data frame, we can  reshape it to have a different view. And to make   a better analysis. In this case, we can use the  pivot method, as I'm going to show you right now,   you only have to write the name of the  data frame, followed by the pivot method,   and then specify three arguments. So the first  one is the index. In this case, I'm going to   reshape this data frame to send that column  food as an index. This means that the column foo   will be in the position where is right now  the numbers from zero to five, on the left.   Next, you have to define that column. So these  are the new columns that we're going to see   in our new data frame, the one that we're going to  reshape, so in this case, I'm selecting the data   inside the bar column as new columns. This means  that A, B and C will be the new columns in our new   data frame. And finally, we have to choose the  values we wish to show in this new data frame.   So in this case, I'm choosing the best column.  So all the values inside there will be shown   in our new data frame. So this is the column  that I'm selecting. And now I'm going to show   you the result of this pivot method. So here  it is. And as you can see, here, we have the   foo in the index, as I told you before, and A,  B, and C, that are data from the bar column. Now,   our columns in this new data frame, also all the  data inside this bass column, is the only data   that is displayed in this reshaped data frame. And  now let's see why is sorted this way. So why one   is here, two is here, three is here, and so on. So  here, the value is defined by the index or row in   the column. So between one index one, and column  A is one and why that happens, because if we go to   the, our previous data frame for the original data  frame, that is here, we can find that here is one,   A, and the value that corresponds to that pair  is the number one. So let's pick another one.   For example, five, here, we have two in B.  And if we go here to our original data frame,   we have that two and B, the value that corresponds  that pair is five. So that's why this value   is here. And that's how this new data  frame was reshaped. Okay. And finally,   we have the pivot underscore table method.  And this one creates a spreadsheet style   pivot table. So this is similar to the pivot table  that we will find in Microsoft Excel, for example.   And this one supports data aggregation and explain  you more about the pivot underscore table method,   as well as the pivot method. We're going to see  some examples in the next video. And this time,   we're going to write some code so you can  understand much better what we're doing.   Alright, now it's time to say how the pivot method  works in action in pandas. So first, as usual,   we import pandas as PD. So here, I import this  library, and then we're going to Use a different   data set to work with this peel method. So to read  this data set, we use the PD read underscore CSV   method. And inside parentheses, we write the  name of this data set. So in this case is GDP,   that CSV that you can find in the notes of this  video. So this is the new data set. And now let's   have a look, I'm going to run this one. And as you  can see, here, we have data about GDP per capita   that is in this column. And basically, this is  how the GDP grew over the years for each country.   So here, I'm gonna tell you, which are the  columns we're going to use for this example.   So first, we're going to use the Country column  that contains data about different countries,   then we're going to use that year column that  well contains different years. And that GDP   per capita, that it's in this column. So  basically, what we want to do in this exercise   is to obtain a different view of our original  data set. So this data set that we're reading   here with pandas has this view that we want to  get a different view to have a better analysis.   So the goal of this exercise is to see the  evolution of the GDP per capita over the years   for each country. And then we're going to put  that country names in the columns. So the only   data we're going to show in our new data frame is  going to be that GDP per capita that it's here.   So I want to show you now this with code, and  let's write it here. But first, let's assign   a variable to this data frame. So here, I'm going  to write df underscore GDP. So this is the name   of my data frame. And now I'm going to show  it in its here. So now I'm going to copy this   data frame. And to use the pivot method,  I'm going to paste this one. And now right   that pivot, now, we open parentheses. And now  as you might remember, from the previous video,   we have to introduce three different arguments.  And if you don't remember the three different   arguments, we have to introduce here, you can only  press the shift and tab keys on your keyboard,   and you will get this. And here, you can see  there are three arguments I'm talking about.   So first, we have to write that index argument.  So write index. And, as I told you before,   I want the year column to be the index of my new  reshaped data frame. So I'm going to set this   year as the index of my new data frame. So right  here, year, next, we write coma, and press Shift,   and tap to show this. So the second argument is  the columns. So we write columns, then equal and   open quotes. So here, as I told you before, I want  the countries here listed in the Country column,   I want each country to be an independent column.  So for example, here, let's say we have the   United States. So I want the United States to  be column number one, then column number two,   China, then Australia, then Spain, and so on. So  each country should have one independent column.   So that's what we want. And to get that we have  to set the Country column here to the columns,   argument. So here, country, and that's it. Now,  again, Shift plus tab to show this window here.   And now the third argument is values. So here,  I'm going to write values equal to open quotes.   And here, the only data I want to show here  in my new data frame is going to be that GDP   per capita, which is the one that is here. And  now I'm going to copy this one and paste it here.   So remember, our goal. Our goal is to  see the evolution of the GDP per capita   over the years for all the countries listed here  in this column. So here, we're going to execute   this code and let's see the result. So here Ctrl,  enter, and as you can see here, I have the new   view of this data frame in It looks much better,  it's more readable, because we can see the GDP   evolution over the years for each country.  So now let's verify if everything is correct.   So here we have the index year. And here we have  the year as index. So everything is fine, then the   columns should be country. And now we have each  country in the columns. So it's correct. Next,   the values are the GDP per capita. And yeah,  we have here, that intersection between the row   and a column is our value that corresponds to  the GDP per capita of that country in that year,   so everything is working fine. And there you have  it. This is how the pivot method works in pandas.   Okay, now let's see how the pivot underscore  table method works in pandas. So in this case,   we're going to work with a different data  set. And to read it, we're going to use the   method PD rate underscore Excel, because in  this case, the data set is not a CSV file,   but an Excel file. So we use rate underscore  Excel for an Excel file. So in this case,   the name of this dataset is super market  underscore sales, that x LS x. And this is what   we're going to see after you run this. And  here, you can see that we have different   columns about what specific person bought in a  supermarket. And here Well, we have the branch,   the city, the gender and different data. So here  to make a pivot table, we're going to first name   this data frame, and I'm going to name it DF  underscore sales. Now, I'm going to show it here.   And okay, now it's here. Okay, the goal of this  task is to see how much female and male is spent   their money in this supermarket. So to do  that, we're going to use the pivot table   method in pandas. So first, I'm going to copy this  data frame. And now I'm going to paste it here.   And now we're going to make a pivot table and  add an output function. Because remember that   the pivot underscore table method allows us  to add an aggregate function, and the pivot   method doesn't support that. So we're gonna  use the pivot underscore table this time.   And now we're gonna introduce some important  argument. So the first one is the index. And   in this case, if we want to see how much  male and females pant in this supermarket,   the index is going to be the gender. So here, I'm  going to copy gender here. And it's going to be   here, index equal to gender. So this is the first  necessary argument. And the second one is going to   be the aggregate function. So we have to write  a Double G, F, U, and C, and then equal to and   then write the aggregate function we want  to perform. So in this case, is going to be   a sum. So we write, sum, and now everything  is ready. So what we're supposed to get here   is the information about the sales here in  this data frame, but now divided by gender.   So we have the female category, and then the  male category. So let's verify this. I'm going   to run this one. And as you can see, here,  we have this summary table or pivot table,   and now it's divided by gender. So we can see how  much female is spent here in the total column,   and also how much male is panned. Also in the  total column, and here in the Quantity column,   we can see how many products they bought, how  many products female and male bought in this   supermarket. And one detail you might have  noticed is that only that columns that contain   numerical data are displayed here. So for  example, here, branch and city that contain   only tax aren't here in this pivot table,  because here in the aggregate function argument,   we indicated that we want to sum and when we sum  values, we cannot some text, but only numerical   data, so only the columns that have numerical  data are displayed in this new pivot table.   Okay. That's our first pivot table. And we can do  even more. For example, we can select a pair of   columns that we're interested in. So let's say we  only care about the quantity and the total column.   So we want only those columns. So we can get that,  I'm going to copy this one. And to show you how to   get only those two columns, I'm going to add  a new argument. So here, I'm going to write,   in this case, the name of the argument is  values. So I read values equal to, in this case,   I'm going to select the quantity and the total  columns. So I open square brackets, because I'm   going to select two or more columns. In inside, I  write the name of the columns. So first quantity,   right here, and then total. So here, too, so we're  going to get the same pivot table, but in this   case, only the quantity and the total columns  are going to be shown in this table. So I'm   going to execute this one in here, I get an error,  because I didn't include this comment. So I'm   going to add it here. And now everything should be  fine. And yeah, we got the same pivot table, but   only the quantity and total columns are displayed  here. And here, we can clearly see that female   spent more than male in this supermarket.  But we can get even more detail here. So far,   we know that female is paying 167,000 In  this supermarket. But with pivot tables,   we can even know in which product lines,  this money is spent. So let me show you here,   we can see how the money is spent in this  product line column. So we only have to add   a new argument to this pivot table method. So  I'm going to show you here, first, we copy this.   And now I'm going to paste it here. And  we're going to make a pivot table that says   how much male and female spent in each category or  Well, product line. So we add a new argument, and   this one is going to be the columns argument. So I  write columns, then open quotes, I add the comma,   in here, I write the name of this column, that is  product line. So I scroll up, I copy this column,   and then we're gonna see in which category he  spent the money. So health and beauty or sports,   and so on. So now I scroll down in here, I paste  it. And before I run this code, here, we only   want to display that total, because we only want  to see where the money goes not the quantity. So   only total, so I delete the square brackets to and  with total, we're gonna see where the money goes,   divided by gender. So here, I run, because it's  ready. And now, as you can see, here, we can see   how much female in males pant in each product  line. So we can quickly see, for example, that   female is spent more money on fashion accessories  that male and that kind of makes sense. And also   in sports, women is pant, or female as pant more  money than male. So we can easily see all of that   by using the pivot underscore table method in  pandas. And this is similar to the pivot table   you will find in Excel. And that's it. That's  how you make a pivot table in pandas. Alright,   before showing you how to make visualizations  with pandas, first, we have to check the data   set. And also we have to make a pivot table. So  we can easily make the plots with pandas later.   So first, we have to import pandas to read this  CSV file. And well, I have this import pandas   as PD. So we just run this code. And now let's  read this new data set. So as you might remember,   to read a CSV file, we have to use the read  underscore CSV method. So we write PD, that rate   underscore CSV. And then we write the name of the  CSV file. So in this case, the name is population.   And I'm going to use this population underscore  total that CSV so I pressed top to get this the   name. So we have now the name. And now I'm going  to assign these to a new variable. So the variable   is going to be DF underscore population. And  there's core row. So this row data, and now   we're gonna have a first look at this dataset. So  I paste this. And now I'm going to run these two.   And now we have this data frame. So here, as  you can see, we have the population of many   countries throughout the years. So for example,  we have China here, United States, and India.   So we have their population, and Kira wrote the  name row, because this dataset was extracted using   some web scraping techniques. And then it wasn't  modified. So now we have to make some changes   to reshape this data frame. So we make it easy  for us to make visualizations with pandas later.   So what we have to do here is to make people  table to reshape this data frame. And that's   what we're gonna do here below. So we're gonna  make a pivot table, and we're going to use that   pivot method. So as you might remember, the  pivot method returns a reshaped data frame   organized by given index column values. But it's  a pivot without aggregation. So this is what we   want. So we only want to reshape this data frame.  So we're going to start by dropping no values.   So we do that by writing the name of the data  frame. And now I'm gonna just copy the name,   and I paste it here. And now to  drop null values, we have to use the   drop any method, so I write drop in a, and then  we have to run this. And as you can see, here,   we have the result, it's a copy from this data  frame. But if we want to save the changes that we   make to that data frame, we have two options.  The first option is to use that in place   argument. So I write in place, and then set this  to true. So if we do this, and we run, all the   changes that we make to the data frame are going  to be saved. And the second option is to do   something like this to overwrite the content  inside this data frames. So we do something like   this, we write df underscore population underscore  row is equal to the same data frame, but that drop   in a so we're overwriting the content inside this  data frame. So I'm gonna choose the first option   just to reduce some code. So I write in place  equal to true, and now Iran, and this new data   frame shouldn't have any new values. Okay, now  it's time to make this pivot table. So first,   I'm going to show you what I'm going to do. So  we have a better idea before writing the code.   So here we have the original data frame. And what  we're going to do is to reshape this data frame.   So I want the year to be in that index. So  the year column, I want it to be here in   the index instead of 01, and so on. And then  I want that Country column or the country,   the values inside the country column, I want it  to be here in the columns. So for example, I want   China here in one column, then United States in  another column, and then India in another column.   In I want the population data inside the data, I  want this to be the only data here. So to do that,   we have to use the pivot method. And that's what  we're going to do here below. So let's do it here.   So first, we have to write the name of the data  frame, which is this one, and then write that   pivot, then we open parentheses in here. Let's see  the arguments that this pivot method accepts. So I   press shift and tap. To get this helpful. Let's  call cheat sheet. And now we have the arguments   that this pivot method accepts. So first is the  index, then the column and then the values. So as   I told you before, the index, I want it to be the  year column. So we have to write index equal to   open quotes and I write in year, then comma, and  let's check another argument. So the next argument   is the columns. So I want the columns to be the  country. So the data inside the country columns.   So here I write columns. Then I open quotes in  here, I read country. So country. And now the last   one, I think, is values. And yeah, its values. So  I want the values to be the population data. So   let me see if that's correct. And yeah, it's here.  So population, and I'm going to press Enter here,   so it looks much better. In our population, it's  here. So I have that three arguments that index,   the columns and the values. Now, I'm going  to reshape my original data frame. So here, I   press Ctrl, Enter. Now, as you can see, here, we  have the countries in the columns. So here we have   many countries. It's from the first country,  Afghanistan, to Andorra, Argentina, Uruguay,   and many other countries. So we have also  the year, so it's here, the year from 1955,   to 2020. So we can see here the evolution  of the population throughout the years for   all the countries in this dataset. But as you can  see, there are many countries. So what we can do   here is to select just some countries. So  we can simplify our visualizations later,   in pandas. So here, I'm going to select some  columns. But first, I'm going to name this   new data frame, I'm going to give it a name.  So I'm going to name it DF underscore pivot.   So this is my new data frame. Now I'm going to  rearrange this, and now it looks much better.   So now I'm going to run this. And now let's  select some countries. So I copy this pivot data   frame. And now we open square brackets, double  square brackets to select two or more columns.   And here, let's write some countries, the  first United States, then, let's say, India,   then China, to more countries, Indonesia.  And last but not least, Brazil. So here,   we have the five countries. So I run here in we  have these five countries and population from   1855 to 2020. So great. Now we simplify this data  frame. And now I'm going to overwrite the content   inside that data frame DF underscore pivot. And  I'm going to write here, DF pivot equal to DF   pivot and with the selection, so I'm overwriting  the content. So I press Ctrl, enter, and our new   DF underscore pivot is here. So we have it here.  And now I'm going to show it to you. And this is   our new DF underscore pivot data frame. And  that's it. Now our data is ready. So we can   use it to make free visualizations with pandas.  And that's what we're gonna do in the next video.   Okay, now it's time to make some visualizations  with pandas. In here, I have the data frame that   we created. This is the pivot table we created  in the previous video. And as you can see, here,   we have five countries in the columns. In here  we have the year in the index from 1855 to 2020.   So what we're going to do now is to make our  first visualization, so I scroll down here,   and the first one is going to be line plots.  So here first to make this visualization,   I'm gonna copy the name of the data frame, and I  paste it here. So now to make plots with pandas,   we have to use their plot method. So we  write that plot. And now I open parentheses.   And one necessary argument we need to introduce  is the kind of argument so I write kind,   now equal to, and here I have to write the kind of  plot we want to make. So in this case, is a line   plot. So we write line. And this is actually the  mandatory argument we have to introduce here. And   now we can run this code so I press Ctrl N As you  can see, here, I have the line plot. So in this   line plot, we can quickly see the evolution of  the population throughout the years. For example,   China and India, which are green and orange  lines, they had some fast growing population,   while the United States, Indonesia and  Brazil, they have lower population mean,   also, the population didn't change so much in the  past 50 years. Here, we can add more arguments to   this plot method to customize this line plot.  So here, we can introduce another argument,   which is the x label. And this x label is what  you can see here, here, when we created this   line plot, by default, it was assigned this year  label, but we can change it. So for example,   let's say we have, we want to write year, but  now with capital letter, so we right here,   here. And now let's say we want to add a new label  here in the y axis. So here, we have only to write   y label, and then equal to open quotes. And  here, we have to write the name we want. So   in this case, I'm going to write only population.  And finally, we can also add a title. So we can   add any title we want. In this case, I'm going to  write well, the name of the argument first title,   then equal to, and then the name of the title  is going to be, let's say population from 1855,   and to 2020. So this is the title. So let's run  this. In. As you can see, here, we got the title,   population 1955 to 2020. And the x label and y  label were modified to finally we can add one more   argument in this case, the argument is the size of  the figure. So to change the size of this figure,   we can add the argument name, fixed size, and  this is a tuple. So we have to open parentheses.   And now to edit the size, we have to add two  arguments. The first is the size of the x axis   and the second the size of the Y axis. So in this  case, I'm going to set it to a and then four,   which means that the x axis is going to be  large, while the y axis is going to be short. So   here, I'm going to run this code, and let's check  it out. So here the figure has a different size.   And that's how you can customize this line plot.  Okay, now let's make a bar plot with pandas.   So the first thing we have to do is to select  only one year, so the bar plot only accepts   years one year, and we can plot their their  population of different countries. So let's select   one year of this data frame we have before. So I'm  going to copy the name of the data frame. So you   can check it out again. So this is the data frame,  and we're going to select one year. So to do that,   we have to use the index attribute. And then that  is in method. So first, I'm going to show you   the index method, in case you don't remember.  So here the index, sorry, again, that index   attribute allows us to see all the index in this  data frame. So we have here from 1855 to 2020.   So that's what the index attribute does. And  now if we use that is in method, we can filter   out some inks. So here, let's say we want  to select only that 2020. So I copy 2020.   You know, here, I write equal to, and  first, I'm going to make the selection.   So it's here. And now I'm going to show you what's  the result. So here, I press Control, Enter,   and the result is this little data frame that  only contains the population in the year 2020.   So this is important because the bar plot is  supposed to show only population in this year.   So here we have it. And now what we're going to do  is to name this guy Add a frame. So here we write   equal to. And then let's give it a name. So I'm  going to name it, DF underscore pivot underscore   2020. So here, I press Ctrl, Enter. And now I'm  going to show this new data frame. Well, again,   here, and here, one little detail, I have to tell  you is that when we make barplot, we have to put   text data in the index. So here, the name of the  countries should be in the index. So to do that,   we have to use the transpose method. So this  transpose allows us to switch rows and columns,   and vice versa. So here, we can easily do that by  writing the data frame the name of the data frame,   you know, that T. So if now we run this code,  we can see here that we have this. So now the   year 2020 is in the column and not in the index  anymore. And country names are in the index here.   So this is the format we need to have before  making the bar plot. So now I'm going to overwrite   the content in this data frame. So I write df  underscore pyboard, underscore 2020, equal to this   same data frame, but that T. So here, I run this,  you know, it's time to make the bar plot. So here,   I copy the name of the bar plot, you know, I  use the plot method. So I write plot, again,   open parentheses. And the first argument is  the kind. So I open quotes, and we write bar.   So now it's ready. And we can run it. So as you  can see, we have a basic bar plot. And it has some   default values, like the name of this x label.  And also the default color is blue. And we can   customize this bar plot a bit more, for example,  I want a different color. So I write the color   argument and then open these quotes. And let's  say I want it to be orange. So I write orange. In   also, we can change the X and Y label. Actually, I  can copy this here, so I can save some time here.   So x label and wire label are here. And let's  paste it here. So x and y label. And finally,   I can add also the title, which was here. So I  copy and paste it. But in this case, the title is   a bit different, because in this case is not from  1855 to 2020. But it's only 2020. So here, I have   only 2020. And now let's run this to see the  results. So you can see here we have the title,   the x and y label, and bar plot is in orange. So  that's how you customize the bar plot. Alright,   so far, so good. Now let's go one step further  by making bar plots grouped by n variables. So   here, we have to select a group of years to make  these bar plots grouped by n variables. So I'm   going to copy this code we use before to select  only the year 2020. I'm going to copy this, in in   this case, I'm not going to select only one year,  but a group of years. So let me show you here.   Instead of choosing only 2020. I'm going to  show you the pivot table again. So you can   easily understand. So instead of choosing only  2020, I'm going to choose some other years here,   so I'm going to delete this. And I'm going to  write it here. So let's say 1980 1990, then 20,   then 2020 10. In well finally 2020. So we have  a group of years here, and we're selecting this   using the index and is in method. So here,  I'm going to give it a different name.   In this case, since it's a sample, I'm going to  write the F underscore pivot underscore sample.   Now I'm going to first I'm going to show you  this one, so you can see what this looks like.   So now we have five countries, no five years. So  now I'm going to assign these to my data frame.   So DF underscore p with underscore sample, I  run this and now we have this new data frame.   So It's time to make these grouped bar plot.  So here, we write the name of the data frame,   and then the plot method. So write that plot, you  know, let's add the first argument, which is kind   and equal to bar. Now we run this. And  as you can see, here, we have the plots,   or the bar plots grouped by year. So here's  1980, in 1990, and so on. And you can also add   the same arguments we added here. So for example,  I can add the x and y label, so I can do it here.   I'm gonna do it fast. So here, I run.  And as you can see, here, we have the,   we modify the X and Y label. And that's it.  That's how you make bar plots with pandas.   Okay, in this video, we're going to learn  one of the most common charts that we can   make intenders in actually any other visualization  tool, and these are pie charts. So before we make   this pie chart, first, let's give a look  to the data frames we're going to use.   In this case, to make a pie chart, we're  going to use the same data frame would use for   making the bar plot because it follows the same  logic. So here, I'm going to copy the data frame   we created for the bar plot, which is this one,  DF underscore people underscore 2020. So this is   what we created before by using that index  attribute. And that is in method. So here,   I'm going to copy this. And now I'm going to show  you here so so you can remember what's inside this   data frame. And it's here. So here, as you can  see, we have the column 2020. And the countries   are in the index. So everything is fine. That's  what we need. That's the format we need for making   the pie chart. But there is one little thing  we have to modify. And this is the column name,   because now it's 2020. In this is a number, it's  actually I think it's an integer. So it's not a   good practice to have numbers in columns. So  what we have to do is to make this a string.   And to do that, we use that rename method. So we  write that rename, open parentheses a now we use   the columns argument, so we write columns, then  open these curly braces. And now we write the   name of the column we want to change, which  is 2020. And we're going to make this integer   value into a string. So we open quotes and  write 2020. So apparently, they are the same,   but the green one is an integer, and red one is  a string. So now to make to save these changes,   I'm going to write in place equal to true. And  I'm going to run this. So now we can make the plot   here, I'm going to write the name of the data  frame. And now I'm going to use that plot method.   So here I write that plot. So the first argument  is kind, in here, I write pi. So the current is pi   a now I run this in here, I forgot to include  that y argument. And I'm going to write here.   So the y argument is supposed to have the data.  So in this case, I'm going to show you here again,   the data frame. So the data is here in 2020. So  we should write here 2020. So I'm going to delete   this. And here in the Y argument, you write the  column that has the data. So that's what we did.   So now I run this. And now we finally have  our pie chart. So here is the pie chart. So   that's how you make a pie chart. If you want you  can even add another argument like the title for   example here. I can say that this is a population  in 2020. But in this case in percentages,   so write this in our we have this title. So  that's how you make a pie chart in pandas.   Alright, so far, we made a pivot table in  many plots using Pandas, and in this video,   we're gonna learn how to export the pivot table in  also the plots we made with pandas. So let's start   by exporting the plots we made with pandas and to  do that, first we have to import matplotlib. So   we write import Math plot lip, that pie plot, and  then we write as PLT. So this PLT represents this   matplotlib.pi plot. So now we run, and we  import matplotlib. And now we can use this PLT   to save the plot. So we write PLT dot save fic.  And now we open parentheses in here, we have to   write the name of the file, we want to export.  And here, I'm going to write my underscore test   that png. So this is the extension. And this is  the name of the file. And now before exporting   this file, I'm going to show you something  here. So probably you know this that when we   make the plot with pandas, we get these words  here that says access subplot and all of this.   So we can get rid of these words by using the  show method. So we'll write PLT that show with   parentheses. And if we run this, we're going to  export this figure. And also we're going to get   rid of these words. So let's try to run this. And  as you can see here, all those words disappeared.   And also we exported the figure to a PNG file. And  now this file should be located in the same folder   where you have this Jupyter Notebook file. Okay,  I'm going to open that file. But first, I'm   going to export the pivot table. So here, I copy  this DF underscore pivot, and I paste it here.   In order to export it, we have to use that to  excel method. So right to underscore Excel.   And now I open parenthesis here, we write the  name of the file, where we're going to export   this pivot table. So in this case, I'm going  to name it pivot underscore table that XL s x.   So this is the extension of Excel. And this is  the name of this file. So now I run this, and   now the Pivot Table shall be exported. Alright,  now I'm going to open the Excel file and the PNG   file we created. So it's here, and here we have  the plot, we export it, and also the pivot table.   So as you can see here, the plot looks exactly  the same as the one we created here with pandas.   And the pivot table is the same. So I'm going  to show you how the pivot table looks. And   here is the pivot table in here is the pivot table  we exported. I open it in Google Sheets and looks   exactly the same. And that's it in this video to  learn how to export data frames as well as plots
Info
Channel: freeCodeCamp.org
Views: 119,811
Rating: 4.9690266 out of 5
Keywords:
Id: WcDaZ67TVRo
Channel Id: undefined
Length: 237min 46sec (14266 seconds)
Published: Wed Nov 24 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.