Python for Data Science

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
well hello internet and welcome to a pretty crazy video in this one video i am going to teach you the core of what you need to know to get the vast majority of everything you could ever want to get done with numpy pandas and plotly all in one video and i have a lot to do so let's get into it so you might say to yourself well that's promising a lot i'm going to show you here how to create arrays matrices randomized arrays how to reshape arrays it's extremely important to be able to organize your data the way you want how to filter how to use different statistics i'm going to get into pandas i'm going to show you how to read from csv files excel files even off of tables on html sites i'm going to show you how to organize that clean up your data in numerous different ways i'm going to show you how to replace column names remove different odd characters i'm going to cover how to select individual columns in multiple different ways i'm going to show you how to work with multiple different columns grab rows of data add columns delete columns delete rows manipulate your data in numerous countless different ways all of these files are available in the description and everything is heavily commented we're going to get into plot we we're going to make really sophisticated types of graphs we're going to add in different styling we're going to add in the ability to zoom in on data with sliders i'm going to also show you how to add buttons to your graphs to automatically change your data just by clicking a button we're going to get into candlesticks we're going to do sophisticated different shading drawing different shapes on your plots adding annotations i'm going to cover open high low closed charts multiple different plots at the same different time also we're going to have sliders on all of this data this information is extremely useful especially if you're interested in data science and of course in finance in general and i'm going to cover everything right now all right so i have a lot to cover so i went and automatically put in here all the different libraries you're going to need and also another tip is this guy right here occasionally whenever you are using anaconda you'll come across situations where you have variables that you previously had in your code and you want all the variable values to be reset this is what you use right there just a tip that i have told people about in comments so i bring it up okay so we have numpy which you're gonna see tons of examples of how to use numpy just quite simply allows you to use large multi-dimensional arrays amongst numerous other capabilities math operators and so forth as you will see pandas extends python more data manipulation and analysis data reader is going to be used to read information from online and and uh also on your file so it provides data in a data frame format which we'll get more into data frames matplotlib this is going to cover plotting and this is just going to keep little silly warnings from popping up you can leave it on or take it off it's up to you constants now i tested everything here on mac as well as windows actually i did everything on windows before mac so i know everything works and one thing you'll need to change is your path and of course i doubt you have a derek bannis directory so you're going to have to point at where your files are and all of the files as well as this file right here is available in the description for free from github all right first thing we want to do we want to create a numpy array so what we're going to do is we're just going to create a list and we're just going to throw some different values inside of there it doesn't matter what they are and to convert this into a numpy array we can just call this numpy array 1 like that all you need to do is say numpy dot array and then pass your list inside of it and then if you want to go and see what it looks like there it is all right so now it is a numpy array now you can create arrays in ranges of values as well as steps and so forth and one thing you can do here and i guess i didn't even mention it basically numpy is just quite simply an amazing scientific computing library it is used by numerous other python data science libraries it contains many mathematical um array stream functions and it is an extremely useful library along with all the basic math functions you're also going to find functions for linear algebra statistics simulations and so forth and so on all right so there rundown of what exactly numpy is so let's say we want to create an array in a range quickly and let's just start off something simple let's go numpy array and you would just say numpy a range like a range of values and this is going to give you values from 0 to 4 and you can just go like this and like this and there you can see we generated those now let's say that you would like to step through the values like you would like to show every other value like you would like to have 0 through 9 and you want to show only the even values how could you do that well let's go create another one here and again mp that stands for numpy a range like that and we can go zero let's say zero through ten remember it's not going to print the last value and then we can say two like this and numpy array three and you can see that's another way and you could go and use mathematical operators and so forth inside of here for generating the different steps now matrices are extremely useful especially with data science so we're going to spend a lot of time with those let's say i want to create a four row matrices or matrix with three columns and i want every single value to have the value of one well let's go matrice is equal to and numpy and ones and what did i say i want i want four rows you're going to put your rows in first and then you want your columns right like that and there you can go and you can see i created exactly that four rows of three columns each all right now what i want to do is i want to create a four row matrix with three columns all having the value of zero well guess what you do almost exactly the same thing so i'm just going to copy this paste this inside of here and let's just call this number two just to keep everything different a lot of people use the same variables over and over again going down these sheets i do not do whatever you would like but i think it works much better to take the time and go and do different ones all right and you can see there we did that we're also going to be able to generate random values so inside of matrices and this is extremely useful in data science so we're going to come in and you just say mp and random like that and we want random integers and we specifically want values the random values are going to be between 0 and 50 and i want to have now i define my matrices it's going to have four rows and three columns like that and we can go like this and you can see generated totally random values right like that and each time you update it it's going to continue doing so all right and i'll show you later how to generate random values and and know that they will always be random using something called a seed another thing it's kind of useful is sometimes you want to be able to generate values random value they're not random you want to generate a certain number of values between two different numbers how do you do that well you just come in here like this and numpy and linear space like that and let's say we want 1 through 10 and we want 10 values to be generated for those and we can do it and you can see what those are right there and you can see also that they change if you change the starting value it's going to get equally distance values between those different points now something else that is extremely useful is to being able to reshape erase so i'm going to create a new array here and it is going to be an array that is going to have 12 random values so i'm just going to go in and say random and can you remember a random int like that and i'm going to say 0 through 50 and 12 random values exactly like that and we can output it and see that yes indeed they are random all right so we have this guy generated and it's just a simple array let's say we would want to reshape it into a three row four column array now there are 12 values right here and what is 3 times 4 12. so that gives you a hint of how you will have to recreate these because if you said i want to reshape it into a three row five column array it isn't going to work all right so make sure you understand that so what can we do let's just say just keep this matrix c6 i'm breaking a rule that i told you previously all right so i want to reshape it and i want to reshape it into three rows and four columns and just like that i was able to do this let's say that i would now want to reshape into a 3d array with three blocks that is the z value or however you want to work refer to it some people refer to it as sheets it's a three-dimensional um array that's what it is alright so we want to go and reshape it into a 3d array with three blocks two rows and two columns well let's go and give it a different name so i'm going to say matrix e7 is equal to and as you see six and reshape if i can spell reshape right reshape like that and i said i want three blocks that's what that three is and then i want two rows and two columns and there it is amateur c7 and you can see how that is structured whenever you create that all right now what i'm going to do is i'm going to reshape into a 3d array with two blocks three rows and two columns just to show you how things look a little bit different so i'm just going to go and copy this just to save me time and u time and everybody will be happy all right so throw this inside of here keep that like that and i'm going to change this into what i say two blocks and i want three rows like that and three and again 2 times 3 times 2 equals 12. so that all works out wonderfully and we can run it you can see that's exactly what we did but how do you get these values out of here well quite simply if i have matrice 8 and let's keep this on here so you can see it what i'm going to do is let's say i want to get the second block well these are zero index so the first block here has an index of zero this one has an index of one yes i know it's weird but that's the way it is and if you're used to programming you probably are well aware so i want the second one so i'm going to put a 1 inside of there then what i want i want the third row so that's going to be 2 and then i want the first column and that is going to be 1 or no it's going to be zero there we are and there you can see we got a value of one what is it i said second block that's this guy right here i said what third row well that's this guy right here and then i said first column that is one and that's how i got it another thing you're probably going to want to do is filter arrays so what i'm going to do here is i am going to print out major c6 just so you can see how things change here and what i want to specifically do is i want to provide a boolean array where all the values are above 20. how do i do that well i can just say a major c6 and greater than 20. exactly like that and you can see here true value y because that's 49 there's 49 there's 28 they're all true this one should not be true and it isn't it's false all right so why is that useful well i can go and get a gigantic matrices and i can return an array with just values that meet a specific condition so i could do something like matrices 6 and matrices 6 and greater than 20 exactly like that and get rid of that period though and boom and then we'll give you just those values you can put any condition inside of here and it will filter out just what you're looking for now there are countless numbers of different statistic operations this tutorial is a little bit more focused on finance above all else but obviously it's going to be used in multiple other different areas of data science so what i want to do here is i want to generate 50 random values between 0 and 100 and just give you a small sampling of the type of statistical operations built into numpy so i'm going to say random int like this 0 to 100 and let's say 50. there we are you can see if you want to see what it looks like there it is gigantic okay so what i want to do now is i want to let's say i have all these randomly generated values and i want to find the mean for them how can i do that well i can just say major c5 and mean value like that and let's just do a whole bunch of them so you can see them all at one time so let's go copy that and i'll show you like five of them that you're going to use a lot all right so we have standard deviation like that and this guy over here is just going to be std like that and we're also going to be able to find variance and i'm not going to get into the specifics of what everything is standard deviation is just like the difference between all the different values okay and variance is a version of that i have other tutorials that cover all this stuff in massive detail look in the description you'll see a data science list of everything and this statistics tutorial and so forth and so on all right all right so minimum and men like that i wish i could put all those different things inside of here but you know this would be forever and the whole point of this video is to cover as much as humanly possible as quickly as humanly possible and you can see right there all of the different values and if we run it again you're going to see that those values change all right so great and i mentioned before sometimes it's very useful to be able to generate random values and replicate that randomization so one way you can do that is you can define what is called a seed it doesn't matter what this value is you can throw 500 in there you can throw 10 000 in there whatever you'd like and you'll just know that whenever you use this that it is going to give you the same random values every single time you run it so just to prove this to you let's go 50 and 10 like that and matrices 9 like that and if i come back up inside here run it again you're seeing you get the same values but worry not in the next time you go and generate a random array it will be random once again so random int 0 and 50 and 10 like that and major c10 like that and there you could see if we run this again all of a sudden magically you got random values and these ones are still the same all right so there you go that is a quick rundown of a lot of the most commonly used things you need to completely understand when it comes to numpy and now i'm going to do the same for pandas now basically pandas is going to enhance numpy and it does so by providing numerous different tools to work with tabular data like you'd find in spreadsheets or databases or csv files excel files all that and it's widely used for data preparation cleaning as well as analysis and it can work with a wide variety of different types of data and it has multiple different visualization options but i'm also going to show you plotly which i think is the best when it comes to visualization okay so we are going to read data from a csv file excel file multiple different types of files so you can create different functions inside of here so i'm going to go like this so i'm going to say get data frame that is like the special type of array that we have inside of pandas just to keep everything as simple as humanly possible and let's say i have stock information and i want to read it now anytime you're reading data from a file or from the internet or from wherever there's the potential that you could get some type of an error so that's why i'm putting this inside of a try block so up here i have path it's a path to all of my stock information if you don't have anything in separate folders you do not need to do this but i'm just showing you so that you know all right otherwise you would just put whatever your file name is if it's in the same directory as this code that is executed so and i have everything set up here so i have the path and ticker and then that is then followed with a it's funny i just forgot where the plus sign is okay so i got it and uh csv all right and make sure you put a dot here now if you don't have this in a separate directory you would just put whatever your ticker is if it's in the same directory but i don't have that it's in a separate directory so i'm showing you how that works now if let's say you have a certain column in your data that you would like labeled as the index well that is very commonly done with dates so you do index column like this is equal to and date like that and then you can say parse dates is equal to true and that will go and automatically assign date from being a simple column to being an index you can use to search with okay anytime you have a try block you have to have some type of exception handling chances are it's going to be file not found error so i'm going to throw that inside of there and i'm just going to throw pass inside here which means it's going to do nothing but you could do something like print file doesn't exist or something like that all right so else and if we have no problem we're going to return our data frame now what's useful about this is we could come in and say something like microsoft data frame is equal to and then we can call get data frame from csv and plug this right here and we could go and throw inside of here microsoft's ticker like that and if we want to see what we get back we just go like this and oh it's not defined oh let's just go up here and run all so run all and guess what i put the wrong path in my pat it doesn't matter what the path is i just want to show you though uh this is where i have my path okay so i had it pointing at the wrong place and that's why i couldn't find the file but here you can see it found microsoft and it has adjusted close as well as daily return data as well as all the dates and whenever you see the date down there it means it's the index and these are just separate columns all right so how do we read data from excel well i'm going to define another function and i'm going to call this get tf from excel doesn't matter what i call it and i'm going to pass in whatever the file path is to grab that file grab this drop that right there d if is equal to and then what you do is you go pd and read excel like that and pass in the location of your file and we're just going to copy all of this stuff up here because it is exactly the same so go up here go over there paste that inside of there and there we are looks good okay so i want to get that information from an excel file and i have this guy right here i'm just going to define the file and it can be any excel file doesn't matter what it is and let's say that i just go uh this is stock sectors so i'm just going to go sectors like that is what's located inside here obviously that's what it says and then i'm just going to say get df from excel copy throw this right there and then i'm going to put the file inside of there like that and then i can say s and sectors like this and it's going to automatically go in to grab that excel file and create a data frame for it so very very useful we're also going to be able to read data from html so what i want to do here is i'm going to jump to a wikipedia page and what's really awesome is let's say i want to get a list of all united states governors in this table is that complicated actually it is not i just plug in whatever the html location is and then i can go and run it and get all of this information but you said i want this i do not want all this other stuff that's inside of here how do i do that well i can command enter and drop down here and i can go and just simply say well this is the first table and i want the second table zero index just like before so i just come down here and i put brackets and one like that boom now i have a data frame with exactly the information that i am looking for all right and there is all of it and i'll show you how to go in and manipulate all this information also so that'll be really useful now what i want to do is show you how to get demographics data okay so there are tons of different tables inside of here and i'm trying to specifically get this vital statistics which is life's and births and deaths and so forth and so on so one thing that i can do is i can actually come in here and search for a word that is in the table which is really awesome and it allows me to go and get more information so what can i do well i can come in here and just like we did before i'm going to be able to say let's go d and data is equal to and we're going to say pd and read html and just throw in the location and then after that i can come in and say match equal to and a the word that i'm looking for in the table is average population and it's going to give me only tables that have that so let's see if i got a match with just that okay no i got multiple different tables but if i look through here and i look for the different information that i'm specifically looking for which is actually this okay so it's actually i just want the first table so i just go like that boom and now i have all of the information but it's very very dirtied up and a mess don't worry i'm going to show you how to fix things like this and things like this and all of these other different problems first thing i want to do is i want to replace the spaces in the column names when their space is in the column names it makes it harder to work with that data so let's go and get rid of those spaces i'm going to go d and data and columns like this if i can spell columns right columns like that is equal to and what i can quite simply do is just go x and replace and any spaces that are inside of there with underscores right like that and if i want to cycle through every single column name i just say 4x in data or d data is what i called it d data like that and columns and that will give me all of those and then i can say d data like this and you can see magically now there's underscores everywhere there was a space what else do i want to do well i want to remove characters in the columns so i want to remove these guys right here and i want to remove these guys right here how do i do that i'm going to use regular expressions to do that so if i want to remove parentheses and everything that's inside of them how do i do it so i could say d data dot columns like this is equal to d and data dot columns like this and call string dot replace and then this is a regular expression and what it is quite simply saying is that i want to remove parentheses and whenever you work with parentheses you have to put a backslash because you just need to because it gets confused all right so this is this is re this right here is really just this so it's saying search for this parentheses and that parentheses remove those and then remove any letters that reside inside of there how you do that is you go dot and star and that means remove all okay so anytime it finds all those i want to instead replace it with absolutely nothing so let's come in and d data i don't know why that popped over there let's just drag it over here and boom whoops uh let's see if i can figure out what i did wrong here oh i knew i need to go and put in another one of these oh i know what i need to do get rid of that and i need to put quotes here to close off that all right is that it yes all right so i got rid of the parentheses now i want to get rid of these nasty brackets so i can just go copy this and basically do exactly the different thing or the the same exact thing so i'm just going to copy this put this up here and in this situation i just replace the brackets so it gets rid of the brackets and everything inside of it i want to get rid of that though and i want to get rid of that also run it and now we got rid of the brackets and now we have data that is very easy to work with all right what else do we have problem wise well we have this called unnamed and it's actually here and we have all this garbage stuff that we want to get rid of down here so how do we rename columns well i'm just going to simply get the column that i do not like i'm going to go d data like that and i'm going to say rename and you could add additional column names between the curly brackets that you're going to see here in a second so i'm going to say columns like this and the columns that i want to rename are this is called unnamed this thing so i'm just going to grab it copy that drag it up here and drop it inside of there and i'm going to say that i want that to be renamed to year like that quite simple d data boom and now it is year instead of nothing all right what else do i want to do well i want to remove these characters here that are in the columns also junking up my data how i can remove those is basically the way that i remove this stuff up here so i'm just going to copy that let's go copy that come down here and right here drop that right inside of there and i am going to in this situation go year you can reference your column data by its name so there is year and d data and again it's not going to be columns it's going to be year like that and what do i want to replace so i want to replace the brackets with anything inside of them with nothing all right and run it and now you can see all that extra junk is gone so good stuff all right so what would we like to do now well let's say that we want to select columns let's find out how many children were born in a year so or on these years okay so i can say d and data like that and just live underscore births the column name and it's going to give me all of those right there so that's very useful another thing i can do let's say that i would like to get the deaths well i can go and get column information in another way i can go d data like this and deaths like that and it's going to give me that all right well one thing i'd like to do here is it's kind of not useful to not have the year next to this so i would like to make the year column actually the index d data like that set underscore index like that whoops like that and i target year and i say that in place forces this to be done to the data frame empty and data boom and uh oh yeah d data set and i want this to be dot set index and there you can see now the year is the index and it's down here and that proves it all right good stuff what else would i like to do well i'd like to be able to grab data from multiple different columns and display it no problem just d and data and then you're going to put multiple different brackets here and you're going to say live births and we can also go deaths like that oops if i spell that's right deaths there we go and whoops forgot to put the comments on there or the uh there we are okay so live births and deaths there we go we got all of that awesome information let's say we would like to grab a row of data how do we do that well we can just simply say d data dot location and use our index and so let's say we want 20 20 and we got it and that's going to give us all the data just specifically to 2020. all right wonderful year thought i would revisit it we're also going to be able to i don't know whenever you saw up here before it says there's 86 rows let's say i would like to get 2020 using the row number how can i do that well i can just go d and data like this and i location index location and since there's 86 of them that means it's going to be an 85 and sure enough you can see that we got the same piece of information now what i want to do is i want to add a column to this data and what i would like to do is create a column that shows population growth for each year so i'm going to say d data like that and i'm just going to create a new column how do i do it well population growth and i'm going to put an underscore under there like that and then i could quite simply just say d and data like that and i'm going to get the live births and i'm going to subtract from it the deaths all right so d data like that and deaths like this and there we are and if i say d data like that we now have a new column it's called population growth and you can see how all of that is working out here and you can also see the population growth is actually falling and it was particularly bad in 2020 obviously okay so what else would you like to do well we want to delete a column that's also something that's pretty useful let's say that we found population growth to be quite depressing and we decided that no we do not want that in our table you just say drop it and you put population growth which is the name of it and then you have to go axis is equal to one to get rid of all this information and in place forces it to affect our data frame and the data if you run it again now you see that it is no longer on our list so also good how do we delete a row well let's say that we do not like the year 1935 and we want it to not exist anymore well we just say drop and 1935 exactly like that and the difference is you are going to use a different axes here and other than that everything else is the same because we're dealing with rows so i think i've got that right there yes this is going to be axes zero in place true and the and data boom and there you go and 19 1935 no longer exists sorry 1935. another thing we want to do we often want to manipulate data let's go and get some new data and what i'm going to do is i'm going to specifically get list of countries by gdp and it's the second table you're well aware of how this works because you have done it in the past or you've seen me do it all right i'm going to show you some new ways of working with this data so you're completely aware of how to clean up your data one of the number one skills you must have all right so what i want to do here is let's say that i want to we have a multi um column name list here we don't want that we want to get rid of this all together all right so let's come down here first off let's go and see what those column names are if you want to print out your column names you can just say 4 something like column in and what did i call this i call this c data all right in c data like that and columns like that and then i can just say print and column okay so you can see that we have these multi-columns and we don't really particularly like them so i want to delete one of these the top part of the multi-level column name how do we do this well we can just say c and data dot columns like that equal to c and data and columns like this and drop level and there we are and see data and you can see that we got rid of that all right so that's useful what else would we like to do well we only want to keep columns if they haven't been used the name hasn't been used prior so let's say we want to keep country region estimate and year but we do not want these other estimates in years these are from different data sources let's say we only trust this and we want to get rid of any other duplicates that lie inside of there how do we do that well c and data is equal to c and data and location like that dot location and this colon means we want to start from the beginning to the end and what this is going to do is it's going to keep column names if they haven't been used uh if the name hasn't been used prior all right so we'll go like this and c and data dot columns like that and duplicated like that so there we are and c and data and whoops put location in here two times all right get rid of that like that and there you can see we got rid of the duplicate row or duplicate columns this stuff happens all the time so i'm just showing you all the different ways of messing with this data what else would we like to do with our data well we have nand values down here we want to get rid of any rows that have nand values so i'm going to say c and data is equal to c and data and c and data like that and i am going to get rid of you can see estimate and year inside of there so i'm going to go see data and i will say estimate estimate if estimate is a whoops this has to have a opening bracket like that all right estimate and not n a so we'll just draw and put a a little quote inside of there and not n a all right and then close off this bracket and c and data and there you can see we got rid of those any row that had a nand value inside of it all right what other problems do we have here that we would like to clean up well i would like to remove these um any brackets that are inside of here see these brackets with that extra junk that's inside of there so what am i going to do i'm actually going to come up here and i'm going to use what we used previously and where is it there it is all right so we'll just grab this guy right here scroll down paste it inside of there and then we just have to change a couple different things here um actually very very little so this is going to be c data of course and this is going to be c data and c data it's also called year right uppercase here yep so that works out good and do i need to change anything no i think that's that's all i need to change and there you go we got rid of that extra junk that was in the year what else would we like to get rid of well i like to change this from being country territory to this country so let's uh i don't know if that's necessarily the best thing to do but i'm just doing it because i'm just trying to show you different things so here i am going to and also i'd like to change estimate into gdp just to show you how to change multiple different column names so you just say rename and columns equal to and it is country and territory i'm just going to copy this right here copy and paste that down there and i want to change it to just simply country so country like that and then you put a comma and then you can put in more uh estimate i want to change that to gdp so gdp like this and c and data and you can see that it did not work oh and the reason it did not work is i need to go like this and in place is equal to true and now it did work all right good so now you got an example of that another thing i'd like to do i'd like to remove these stars that are next to these country names so let's do that as well c and data and that is in the country column and c and data and country like this and i'm going to say string and replace and i will get rid of the star and replace it with nothing and like those nc and data and boom those are all gone now something else that's really awesome is let's say that i would like to find out the mean gdp by region okay well what you do is you use something called group by so i'm going to say c data like this and group by like that and we are specifically looking for region so region like that and dot and mean i like there and boom you can see just quite simply we can go and get the mean gdp by region for all these different countries and uh you can also go and get the median also so i'll just copy those and standard deviation you can do all this stuff but we'll do the median like that and boom and you can get that information as well let me now give you yet another example before i start covering plotly so let's say we want to do some analysis here and we have a we have two stores and they sell different types of ice cream and they sell a certain different number of ice creams well what i want to do is i'm going to structure this as a dictionary and i'm going to define my store here so i'll have store and this is just a you know type of somewhat of a real world example okay so let's say that we have different amounts of ice cream sales we have different flavors and those flavors specifically are going to be chocolate and i'm just going to abbreviate this vanilla and whoops vanilla and strawberry and um let's say chip all right for chocolate chip okay so we have all of that information and also we have sales information so sales like this and i would like to okay so we sold 26 and 12 and 18 and 22. all right so that is how all of the that data for that those ice cream stores is going to work out well let's say i want to convert this dictionary into a data frame i say ic data is equal to pd dot data frame like that and pass in my dictionary and i can say ice cream data and there you can see so we have store one sold 26 chocolates and store two sold 12 vanillas and all of this let's change this to chocolate just to change things up here a little bit so whoops chocolate so chocolate like this and there you can see all right so this store sold vanilla and this sold strawberry and they sold an equal amount or a different amount of chocolate but they did sell chocolate ice cream okay so what i want to do here is i want to group my data by store number so i'm going to say by store equal to and i can say ice cream data like that and i want to group by and the store all right and i can then go and get the mean sales by store so i can say buy store and mean like this and you can see how that works out so we sold 22 and 17 in between these different stores mean wise okay and let's say that we would also like to come in and get the sales total just for one store well after we go and have it organized by store well we can come in and we can say sum like that and the specific location for whoops is going to be 1 like this and we can run that and we can see that one store sold which is actually this one sold 44 different chocolates all right and that is store one say store one store one and this would be store two you can change that to two and you can see the difference between them okay and also what's really cool is you can use multiple different functions to get a whole bunch of data on this information that we just input so we can say by store and just simply by saying describe like that boom we're going to be able to see standard deviation mean and counts and so forth and so on all right so there you go that is a rundown of the different things you absolutely must know about pandas and up next i'm going to talk about plotly all right so i absolutely love plot list my favorite library for plotting data it's just what i use and basically plotly is going to allow you to create about 40 some really beautiful interactive web based visualizations that can also be displayed inside of jupyter notebooks or saved html files and it's widely used to plot scientific statistical and financial data and in the description for this number uh numpy pan is probably inside of the description on github you'll find the installation instructions very easy you just copy and paste two things paste them in all right so this is these are the libraries we need to import not going to go waste your time going over that i'm going to go in and actually show you some stuff so what i want to do here is i just want to do some simple line plots they're going to get progressively more complex and then i'll show you a couple other different plots as well and this is going to have kind of a finance leaning because that's really what i'm into right now okay so what i want to do is i want to plot the value of a dollar invested over time and i'm going to do it for one stock and i'm going to show you how to do it for two stocks so i'm going to create something called dataframe stocks is equal to px dot data and stocks so i'm going to use some built in google pricing data here and this is where i'm going to get it from all right this is where i'm importing this information so i'm going to go px dot line like that df and stocks is where i'm pulling this from and i want the date that is how it is stored i'm going to show you how to work with csv files and such later on i'm specifically looking for google and i am going to label some things i'm going to say labels is equal to and on the x-axis the label on this plot is going to be date like that and for the y axis don't worry it's going to get much more complicated as we continue i'm going to do value of dollar whoops over time okay and we can go like that and you can see if we invest one dollar in 2018 that's out here that into the end of 2019 it is now going to be worth uh 1.21 cents all right so pretty cool nice growth of your money all right so let's say i would want to instead go and plot multiple different plots how hard is that to do it's actually extremely easy let's just copy this let's paste this inside of here and y this time is going to be two values so i'm going to have google and you can see the difference between investing in google and apple over this time period so go google and aapl there's apple and that's it and if we would want to throw a title on this as well we can just come in here like this and we can say title is equal to apple versus google and we will see which is better and you can see there indeed that apple you can also see it printed out this latest little key and if you invested in apple instead you would have a dollar and 67 cents all right so very simple line plot let's get more complicated let's say you would want to put multiple different plots on here and what i also want to do is i want to show you some different line styles so i'm just going to do some random values i'll have x1 is equal to and i want to use linear space to define values from 0 to 1 and i want to have a hundred of these so there we are in space like that 0 and 1 and 100 and i am going to also then generate a whole bunch of random y's so i'm gonna go y zero is equal to and np dot random and random n 100 and so they stack on top of each other i'm going to have this be five above and then the other random y values i'm going to stack differently so i can say let's just change this to negative and let's change this is that a zero i hope so now it is okay and change this to one and make this minus five like that and paste this here and we'll print like three of them on the ward or yeah three of them on the board and let's just get rid of this for that and go and get rid of that and now what we're going to do is after we have our values we're going to plot them how you do that is you define your figure object by going go and figure like that and then what i want to do is i want to add multiple different lines using different styles onto the same figure to that you go figure and add trace and go dot scatter like that and we're going to be using the same x value for all of these so x is equal to or underscore x and one and then y is equal to um r underscore y and zero and let's go and change the style of lawn you do that here i'm just going to do a simple line and you just say lines like that and then if you also want to define a name for it you go name is equal to and i'm just going to call this random one and i'm gonna create the one for each one of these so paste and paste like that and that stays the same this is going to be one and this is going to be two and we could also do little dots which we call markers like that or we could just quite simply just say markers like that and change this to random twoness to random three and it's going to plot all those out on the screen at the same time and you can see exactly how those look all right so just some different styling that is provided kind of neat stuff now what i want to do is i want to add even more detail to my plots so what i want to do is i'm going to do a plot of apple stock data and i want to i'm going to go and just go and grab it i know it's here so i don't need to do any type of exception handling pd dot and read famous last words i know csv file and it's called aapl dot csv all right so we got that and i'm going to define my axes for the plot i'm going to make aapl and i specifically want the date information and what i'm going to do here is have sliders and so forth a whole bunch of other different things y is going to be aapl and underscore df and i want adjusted close for that all right so i got that set up um if you want to know what aapl the data frame itself looks like there it is okay so there's the data frame that i'm going to be pulling information from and if i want to now draw this i have to go go figure again and create my figure that i'm going to add my values to and i'm going to say figure dot add trace like that and go dot and i'm going to do a scatter plot like that x is equal to x and y is equal to y like that and figure dot if i want to come in here and let's say i want to add a slider and i also want to change the title on the x-axis i go figure update and underscore x-axis i like that and i can define all kinds of different things i'm just giving you a rough estimate here so i'm going to say whoops didn't want to do that if i want to show a slider on side of it which is going to really look awesome i can say range slider and visible is equal to true like that and if i want to go and give it a title i can say title is equal to and i can say something like give directions like zoom on dates using slider like that and then i can say figure and i also want to add a title to the y-axis so figure update y axis like that and title is equal to stock price and this is going to be in dollars and i need to put this inside of quotes however so let's just select that and go like that and there you go and run it and now you're going to see that information so let's go oops did i accidentally hit it too many times yes i did um let's go click it again there we are all right so this is going to print out all of apple's stock data and there's a neat little slider down here so let's say i wanted to come to a very specific amount of time and analyze that i'm going to show you other different ways to chart and you can see that i can zoom in on any of these specific parts and i can go to any specific date and it's going to give me the time for or the price for apple stock all right so awesome let's get more details what am i going to do i'm going to go really crazy this time okay so what i want to do i'm going to copy part of this up here so i'm going to copy this part right here and adjust to close and let's get this and go figure and i'm also going to go and get this guy right here so add trace so copy and just to mix it up i'm going to use microsoft now as an example so i'll say change this to microsoft and microsoft up here so microsoft and date and adjusted close both the same change this to microsoft like that that's the same and the scatter plot is also going to be the same okay so we run it and now you can see here is microsoft data now one thing that is really cool is i can go and add in a rage range slider but i can also go in and provide buttons that allow me to click on the buttons and jump to different ranges which is so useful and this is what it looks like right so what you do is you go and you do update layout and x-axis dictionary and you can do range selector and then inside of here you have a dictionary of buttons and you can label them and you can have let's say one of the buttons when it's clicked on is going to use steps of days and 10 steps of days and what this is going to do this is just simply a label and this is going to show you the last 10 days of the stock and here let's say we want to work with months we want one month and we want to display one month that's what that means six months six months month one year to date and year we want to show one year we can change this to two years we can change it to five years we can do whatever we would like all right and if you want to show all the stock data you just put all inside of there and here i'm just saying it yes i want to show the range slider and date okay and also i have the title i added the y-axis and now what you can do you have your slider but you can also just simply go one year of data and year to date worth of data six months of data one month of data all right ten days of data all right and you can just click on these simple buttons and get very very specific information so it's very useful and now what i wanna do is talk to you about candlestick plots all right very very useful so um basically candles i'm going to abbreviate this candlestick charts are useful because what they do is they show the open the clothes and also they show the high and the low all in one place so what i want to do is i want to work with apple stock so i'm going to go x is equal to well let's first show you aapl like that and uh whoops that's a p o data frame yeah there it is okay we obviously need this data so you need the low the high the open the close to be able to do this with candlestick charts okay so and what basically a candlestick chart can help you sort of sense the emotion involved around a stock in the near term so that is why they are found to be very useful so and i'll get more into that as i get more into technical analysis and so forth all right so i'm going to have the x-axis be date i need to define my close and that is aapl and that's going to be adjusted close like that oops and close is equal to and i also need to get the high which is going to be equal to i want to just copy this right here and paste that inside of there and you can see of course that that has the column name of high and let's go and do the low which is equal to there we are and we can just type in low like that and if we want to get our open let's just call this open price equal to and paste that inside of there and change this to open like that and now that we have all of those set we go figure is equal to and go and figure like that and then we can go figure dot add trace like that go dot candle stick like that and then just go x is equal to x and high is equal to high and low is equal to low and open is equal to open p and close is equal to close all right like this all right and name apple is not defined uh where did i do that oh apple i should have put d f so d f and d f and e f and d f and there we are and run it and there you can see is our candlestick chart all right it's not very useful because there's so much data but i can use my little scroller to come in here and scroll through this data all right and i'll get more into why candlestick charts are awesome later all right so what i'd like to do is update my layout with our buttons and what's awesome with plotley is everything's the same so i can just go up here and i can say update layout copy come down here and paste it inside of there and it will automatically work and boom and now i have the ability to go and do all of this extra information all right some of those aren't going to work though because i have a limited amount of information which is basically just 20 20. all right what else do we want well let's say that we also would like to come in and we would like to add different titles so i can say figure and update and lay out and x-axis and title and i want to say dates whoops hit the wrong button date like this and i want the y axis title to be equal to stock price whoops stock price and also whoops keep forgetting to put the quotes inside of there there we go and i bet you i forgot it there also yes i did all right stock price and then let's say we want a title for our whole entire chart we go title is equal to and apple candlestick chart okay and there we are so there we go added even more capabilities to it what else would i like to do well let's say that i would like to really target in here the whole uh pandemic period and where the the stock sort of tanked because of that and it's basically in this area right here before it started to go back up again all right we can go and put annotations on this so we can say figure dot and update layout like that annotations notations is equal to and we go dictionary and we're going to say where we want to place the annotation so that is going to be x and we're going to say that we want to place it on the date of approximately this is approximately when the stock market started reacting to the pandemic and i'm going to say that i want this to be around the price of 85 dollars on the y-axis that's where it's going to point at so y is equal to 85 and i wanted to the text to read pandemic [Music] um effects market to sort of point that out in the chart and then i'm going to put my x anchor for this on the right so right like this and run it and you can see right here is a point a little annotation that points at where that affects it now let's say that i would like to put like a line underneath here that shows the entire time in which the pandemic negatively affected the stock market i could say figure dot update and lay out and i can say shapes is equal to and dictionary and i can just plug in dates so i'll say that i want this line to go from this period of time right here so copy and paste that right there and i want it to go through the let's just throw this inside of here and let's say that the stock market negatively affected the or was affected negatively by the pandemic through this period of time all right and i will then go and say just define y as being equal to one which is basically just going to define the size of it so like this and y is equal to one all right one like that and run it and now we should have yes we do we have a line that goes right down here that sort of defines that area and it's just just showing a demonstration not the greatest thing you know what would look really nice uh if i went and covered that entire area within a low opacity rectangle i'm going to do that so i'm going to say figure dot and you can go add it's so awesome what you can do with plotline this is a and as my tutorial continues here where i cover technical analysis and so forth i'm going to get into more awesome things you can do with plotly this is just the tip of the iceberg so equal to and again we're going to do the same thing so i might as well just copy this so i'm going to copy that and i'm also going to copy this so copy and just paste this inside of here and whoops and this needs to be x 0 is equal to like that and then what i can do is i'm going to say that i want my fill color to be something like light salmon and i want it to be a lower the opacity so you can see the chart through it so i'm going to say opacity is equal to 0.5 and i am which is 50 and then i'm going to say layer is going to be equal to blue and i'm going to say that i won't do not want a line width for this rectangle that i'm defining and if we run it you can see that we went and created that as well all right so all of this stuff and i'll be getting more into it with technical analysis and so forth and on top of that i provided some other bonus stuff here well you know what i'm going to cover i'm going to cover open high low and close as well why not okay so while candlestick charts have a rectangle that is let's go and zoom in here so you can sort of see the rectangle that is going to represent the high low and the close and you can see all of that data right here with the wicks that's the little lines that are coming out of the top and the bottom sometimes are referred to as shadows a open high low close chart is going to use ticks to the left and the right here let me just show you all right so basically we can just say figure dot and let's just copy that why not it's just no point might as well it's almost exactly the same so let's come in and go figure grab this guy right here and copy it all right so copy this guy and come down inside here paste this inside of there and everything else is the same add trace and the thing we do different is we say ohc and we have the high low blah blah blah blah all right everything's the same and there you go and there we have our little chart i'm going to click on this so that i can see the whole thing and there it is and if i zoom in you can see these a little bit better all right so zoom in like that and maybe try to find one that's pretty big there we go that one's kind of big so let's zoom in and zoom back over here and zoom back over here all right so what you can see here is you're going to have your opening price on the left the bar represents up here represents the high here the low and this the closing price and once again it provides you with the option of seeing momentum depending upon the length of the line very very useful to use and one more thing i'm going to do i'm going to show you how to do multiple plots with by merging these different types of plots that we just covered here so what am i going to do well um why don't i just leave this here i'm gonna just add to this what i have here already okay so i have the the ohlc chart and then i'm gonna say a underscore x is equal to a a p l and data frame and the date i'm going to put the regular chart on top of it and a and y is equal to aapl and data frame and i will have my adjusted close like this and then i'm going to add that also so we'll do figure add trace like this and go dot scatter like that x is going to be equal to a x underscore x and y is going to be equal to a underscore y and i'm going to have this be a line so i'm going to put some different attributes inside of here so dictionary and let's say that i want the color of the line to be blue and i want oops like this color equals blue and i want the width of the line to be equal to 1.5 and i want it to be dashed uh a dots so that's a little bit easier to see and dash is equal to dot like that i think i got everything right let's run it and now you can see that it has both of those pieces of information looks kind of cool and we can zoom in here and look at it so there you go guys hopefully you find that useful a rundown of the most common things you're ever going to do with numpy pandas and plotly more videos are coming with real world examples and like always please leave your questions and comments down below otherwise till next time
Info
Channel: Derek Banas
Views: 7,754
Rating: undefined out of 5
Keywords:
Id: LOQHYn7BLAg
Channel Id: undefined
Length: 70min 57sec (4257 seconds)
Published: Thu Nov 04 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.