Financial Data with Python: yfinance

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
welcome back everyone to data science for everyone today we're going to be looking at financial data with uh python let's get started so while i'm waiting for this i'll do a little bit of some discussion on this so again we have our standard imports pandas as pd numpy is mp c born as sns matplotlib pi plot is plt and then again our matplotlib inline we need to pip install why finance why finance is a api for um yahoo finance um the usual one that we use people use now is also pandas data reader but it does it's a little bit finicky um recently i've been using more of um this y finance it has some eccentricities that we'll we'll talk about um shortly and then also then at the end of this once it has successfully installed in here uh make sure and do uh import y finance as y f and then we'll get started on the rest so next thing we're going to do is just do a basic example okay i'm going to actually take this just from the github page and so and then we will from there we'll actually do a really complex more complex example we'll even uh grab some and do some data cleaning as well so let's go from the basic example so we've already imported why finance is yh so let's do microsoft and we can do yf dot ticker and so this does expect that you know ticker symbols and how to do that um check your old finance classes et cetera for that so msft we run that and this gives us a ticker object so if we do msft.info here we can look and it will grab all kinds of stuff and so you can see that this is actually in a dictionary so we have all kinds of um static data that we can get so for example what currency it's in what united states it's in if if that has information on the company officers day highs lows again all kinds of stuff even for example the logo of the company legal business names short changes zip code where they're located it gives us all kinds of information from that now we can also go through and grab um historical data as well i'm gonna yeah i'm just gonna go to another line and so we can do something like grab the history and do mess f t dot history here and then we want our period to be and again you can grab um any period so for example one month you can get uh the interval as a day or anything else but what we're going to do is do period max okay and again the standard in here is going to be the interval as daily so i'm going to run this and we can take a look at history and i probably shouldn't have typed it out as hist because i wound up this may cause us some problems later on but you can see this is the type of historical data that it does and this this particular yahoo data goes all the way back to uh march of 86 and this does daily all the way up it gives us information on dividends stock splits everything else through the whole historical time which is great now what we can actually do um we'll let's let's go back and here we just have the historical data all the historical data but let's go back to this information here so msft and we want to look at actions now actions in this instance are the dividends dividends or stock splits okay so it would be these rows here so if we run this hmm oh whoops it's actions actions not action and so notice it just grabs those two uh those two locations and again it's not we won't see it in here because it's truncated there um so we can actually describe just the actions just the dividends and the splits as we want to and again notice it has here there are stock splits on this day there are stock splits in the 90s so again it'll it it doesn't have it won't be continuous at all okay it just gives you that day you can also go and if you just want for example just want the dividends you can grab just the dividends okay you can also go through and if you want um we can also go in and grab just the splits as well now we can also grab very cool stuff okay so msft.financials again this one doesn't have anything but we can also let's see if it has any of the financial the quarterlies uh let's do quarterly financials for now nothing okay so let's see what else we have uh let's see if we have any information on major holders so here you can actually see for example all the information about the shareholders okay um which is uh fairly cool we can also grab for example if there's any institutional shareholders so um institutional holders so here you can actually see all companies that actually hold um and again these are major uh in industries or institutions that actually hold on to their stock which is kind of cool as well so again here you can see the number one shareholder is vanguard with uh let's see hundred thousand so six hundred and million shares something like that which is fantastic we can also grab for example their balance sheets and again this doesn't have anything let's see about let's do cash flow see if there's cash flow no information on cash flows um let's see if there's information on their earnings nothing on earnings but again i'm just showing you guys kind of just information that you could potentially grab some some places have it some don't so let's look if they have a now look here they have data on their sustainability okay um so again it's kind of interesting to see what type of information you can get from here um i will let you guys kind of look at that and peruse that you're by yourself let's also look at what recommendations recommendations here so what firms uh uh want to say what you should do with it so for example to grade so they say that it's buy some say that it's a long-term buy some say that it's overweight and so we can actually even from here um we could do something like let me actually grab this i'm curious as to what everything has in two grade so we can say we want to grade and then we want value counts here and so we can actually look and say all right um out of out of 315 uh rows and again this has this has remember this is also time series okay so we may want to group it by year or something like that and then do it but anyways 90 say buy 67 say overweight and so again you have you have all kinds of information on here on how people actually uh whether they recommend to buy the company stock or not which is fantastic if you want to uh use this for um financial analysis or even um um like if you're a certified financial planner this type of information is priceless to you uh so let's also look at what else um let's look and see if they have their calendar this is their earnings reports okay which is nice so earnings uh date earnings average their lows their highs all of this information and then we can also see what stock exchanges they're available in so let's do is in and here and again so this is telling us uh that's this is their international securities identification code so it's the is in and then let's see if there are um any options okay and so these are their uh uh get options um and their expirations so this is this is great for again doing a deep dive into a single company um but i i personally um a lot of times i like to do kind of more large-scale analysis so let's actually go through this is this again this is just the basics we can always go through and plot these as well so we have um we have that history data we can do plot on top of this kind is equal to line and let's see let's do fig size and we want this to be a 12 by 12. so notice this this looks this just looks like nonsense right now okay because again it has uh predominantly the volume so what we can do in here is do uh subplots is equal to true and this will subplot out everything and i should have put in call them in here here we go and again notice that this is this is the whole time so again if you're wanting to look at basic financial crises or anything else it's better to probably cut it up into the time periods that you want and we can we can talk about that um another time on how to do high levels of financial transaction data and looking deeper into time series analysis but i'd like to go through a little bit more of an advanced example something something more fun at least well i guess i don't know some people may think that this is a drag but i personally think it's fun and why i want huh hold on all right that that was a weird um fake out all right so let's let's talk about advanced uh uh financial data um what importation or gathering maybe okay so um a couple things that we can look at okay so this this was a single company i have a tendency to like to look at uh the major indices okay so uh uh some major financial indices major indices now how would you know what the financial indices are if again if you're a layperson you may have your own idea um some advanced uh directives as well but let me let me go here and i actually have a web page saved so we can have a little bit of fun with this okay so here are all the major um tickers in the world this is on yahoo finance okay so here's all of the symbols for all the major um tickers and indices of the world world indices now we're going to use this okay because maybe let's say that this website updates from time to time maybe rankings change or something so i want to use this as my kind of go-to web page now some of this some of these indices have historical data saved in yahoo some do not and that's okay all right but we're going to use this as our kind of go-to now i'm lazy okay i really really really dislike having to um type things out so we're going to actually scrape from this website okay so what we're going to do is do major indices all right and then here we're going to do pd.read html and we're going to grab that whole data set and i only want the first data set okay the first table so then if we look at major indices take head on here notice we actually have that data that was shown there now again it's going to have nas for some of those because they had um um interactive graphic images inside there and again um we don't actually want those so it's good anyways and technically all of this other data is useless to us i want to grab the ticker symbols okay and so we're going to grab those and we're going to create our tickers now first let's do um we have our major indices here we want to grab these symbols for the symbol list so let's take a look at it great so we have the symbol series now this is not in a format that is standard okay there's this um carrot at the side there and we need to also lower it and then we also want to maybe make it to a list so first off let's do string dot replace here and we want to grab the carrot okay and then we want to overwrite that with nothing so we just have a nothing in there just quotes that's it and so if i show you guys this now notice we got rid of that carrot symbol now from the carrot symbol we want to do stir dot lower notice now everything is nice and lower case and then we also want to do to list and this gets us this nice list of all of the potential major indices of the world so we're going to call this whole list here uh ticker list now another thing is that um the the way that in which this ticker list can be used is a kind of varies okay so if we do something like white f dot tickers now notice there is a ticker so that's if you use a single item and then there is tickers okay so if we grab just the tickers and then we throw in here ticker list and we run this okay we get uh well we need to actually do this we need to do something like ticker data okay and if we look at um the ticker data ticker data here we actually have a bunch of objects and it's giving us some errors okay now this is this is actually their standard way that they want you to use this data set okay in um on their github page i personally am going to delete it because it doesn't work very well sometimes okay it gives you back a lot of errors what we want to do is go through and do something like uh df is equal to yf dot download and then we have our ticker list and then we have the period in here and i'm going to write everything out explicitly i want it daily so one day i want a start date in here and um because i don't want everything i'm gonna do um 2020 uh january um of 13. okay and because this this actually i'm having this match up with um some research that i'm doing on co coven 19 and financial markets and so this is this is my data set for covid19 starts in um january um of this of that last year and so then we also want an end date in here of 2021 uh and today is going to be march uh i believe 10th let me see yes we're march 10th today and so then i'm going to run this now we will be getting some errors in here don't worry about that because there are some data on on there that have been delisted okay so that data is not available to us now we can look at the size of our ticker list here and we can do uh a length of our our ticker list so you can see how many there were there were 39 now notice 29 failed downloads okay that's fine it just means that that particular data didn't have a lot of historical data or maybe it's been de-listed for some uh reason okay don't worry again it gives you a list of what didn't work but i'm interested in what did work okay so let's actually take a look at our data frame now and notice it's a big in okay um it has our adjusted clothes it has um our clothes it has open it has all different items so we can also take a look here at dot columns so notice here what's actually happening is that these are this is a multi-index of columns so it has for every adjusted close it has for example uh this here is um the shanghai stock index this is the shin gen okay these are chinese stock indices and it has this for all of the data okay the adjusted close all the way to the volume so you can keep it all or you can grab only what you need okay for for all the intensive purposes today i want to keep the adjusted clothes so i'm just going to call this adjusted clothes close here and i'm going to do also if you notice here if we look at the data really quickly there's a lot of n a's okay there's a lot of missing data because some some uh depending on the time of year and the location uh they will have some missing data so we need to set up something to clean this data up a little bit okay so there's a lot of n a's so then maybe we'll give it a threshold okay so let's do um uh df dot drop in a and we're going to give it a threshold here of um i'll give it 10. okay and so this threshold here actually guides us on how to uh clean up so again it's how many non-na values are required so again we need at least 10 non-n a values in a column in order for us to uh keep the data so notice this is actually currently it's taught with access is equal to zero as a standard so we'll do axis is equal to one for our columns and we also want to grab the adjusted close for each of them and we also want to drop and partially it's part of this is because i know uh this x a x axis is equal to one uh simply because it doesn't it's not useful data to us okay it's very very um you know let me keep it and then i can show you guys why we dropped it here in a second so now if we look here we have adjusted close and let me look at the head now notice here we do have some missing days okay so we have um and again uh so shanghai stock index shenzhen um dao john's industrial average um shanghai this is for mexico i and i do not remember what these two are i'm sorry and but we can always go back and talk about them so now let's do just a little bit of plotting adjusted close dot plot and remember i'm going to actually do subplots in here because they're they're on um actually let's do this dot describe t and so here we can see that they're on kind of wildly different scales for the minimums and their maximums so again if we plot this all together for example we'll see this um we'll see like the down jones or something like that and we'll see some of these others like this shen gen but the problem being the problem being is that this is not going to work very well because they're on different scales okay so and you know let me just go in and show you what i mean if i just do dot plot notice um let me let me do something bigger fig size we'll do a 12 by 12. so notice all right so we have two up here we have some down here we have some down here now these almost look flat right we don't see any variation in them whatsoever that's not useful to us so we want to go through and grab uh subplots is equal to true and so now we actually see this now we can see here that this xax it stops after uh 2020.05 it stops we have no data so it's useless to us so i'm going to go back up well actually i'll do it here and just put in a note uh will drop xax due to lack of viable data so we'll do something like adj close and we're just going to overwrite it so adj close dot drop here xa x access is equal to 1. so now when if we look at our adjusted close and here we can do um plot um and what do we want here fig size here and we'll do a 12 by 12. um you know what let's do a 12 by 20. so it will take up more of the page and then we'll also do subplots it's true whoops and i need to put that in there so now that didn't come out the way i wanted it to let me re-run it okay so now we can see that we have relatively clear data now there are some missing items in here we can take care of that in a variety of different ways but uh let's um look here and see what we can can see though again everything has this dip again this is this is uh pre-covered or starting at covid and again we can see how everything dropped things are starting to look better but then again they kind of waver again we have these um these dips and jumps all over the place so maybe the next thing we would want to do is maybe clean up the data we'd want to do some other visualizations later on we can even merge this with our coven 19 data that we've looked at in the past and uh do some sort of um analysis okay so we can let's let's talk about some other things here um now i'm have just this subset of data but one thing that we do like to do is resample our data okay so uh for example you you maybe have this data and let's say that you want it like maybe every four months or something like that so you can do um and you know what let's grab let me grab just a single index um for now so we'll do the dow don't jones and do adjust it close and we want um dji so now dji we can take dji we can take a look now one thing that we can do here is dot resample and so you want this as maybe um well quarterly so okay so every four months and you want the mean for those every four months so this will give us the uh mean the quarterly means for this data set every four months now again this may not because again right now i'm just resampling it and doing the calculation um but what you would probably want to do for example is change um change the variation so for example you you want to make sure that everything lines up in the right way you can't just say oh give me get me um the quarterly data starting today well you can we can calculate it just like we did but you need to have it actually set up with the proper quarterly earnings reports that type of stuff so you'd have to make sure that your data lines up you can just run it just like this we also have a couple other things that we could do so for example we could um we can do maybe what do we want to do we can do the daily percentage change okay so we can actually create a new um let me just create a new variable all right dji uh percent change let's just do p chain per change okay and so this would be something like this we have and i'm going to do this just with the adjusted daily close so dj dji here divided by our dj i and then we want to shift the data by one okay and so maybe how how's the best way to show you guys this hold on so we can do something like pd dot no you know what um dj i dot merge and do dji dot shift one this should do it no oh because it's a series um let me do pd dot concat here and then we do um dji whoops dji shift no that's not going to work either all right so pd dot data frame and then we have here um you know what it's fine let me just do dj dji i can show you guys here so this is the the first this is our normal data set now if i grab dji shift 1 it's going to shift everything down one okay and so this is we're able to do um uh differences we're able to do um all kinds of other mathematical uh representations so for example just like doing the percentage change so and that's actually what we're going to do here so we have dji and we'd want to do -1 and here and i'm going to wrap this okay so we can do dji percent change and so we can also do dot plot uh figure size here we'll do 12 by six so here you can see this is this is the difference is the daily percent change okay so over time you can talk about uh volatility of a market okay so again kovid started out there was a bad and then we kind of had this panic and then it started to kind of become calm again okay as you can see again there's some some major dips here and there but overall the market kind of regulated itself to be more calm to this new phenomenon um so and that's that's the best usage of shift now another thing that you can do again this is this is the percent change okay but if we want to calculate up the returns for each day let's do the log returns and we'll call this shift here again this would uh this would be basically um again i don't know how many of you have taken um finance or anything but let's say that you want to calculate up um your rate of return okay so rt is equal to p t divided by p t minus one minus one okay so this is this is your normal uh your normal rate of return here um so and this is for example where p is the daily price of today uh p t minus one is the daily price of yesterday okay and then r is the rate of return so let's go on and actually calculate that up um so we could do something like um np dot log and again we're doing the log of the returns so um we want dj i divided by dj i dot shift one run this and again this is this is why again it's getting rid of the log force and then now we can actually take a look at the daily log returns um and so i'm going to just grab that put that there change this out to log return shift run this and again notice they look almost identical um this um you do have a little bit slightly less volatility when we utilize the log so let's actually look at something that's relatively important okay and that's looking at the histograms so if we do this dji dot hist let's do bends is bends is equal to 50. now this is on the normal data the normal data notice there is there is heavy skew to this data set okay um let's do i'm going to copy that fig size keep using it a lot and actually i'll do to 12. so again you can see here that there's major skew now this is a no no whenever you're trying to do some financial uh predictions that type of stuff we want the data to be regularized over time so that's why we would grab here i'm going to just grab this whole thing so it'll be copied over but we do the percent change or we do the differences and we'll do i'll do it on the log shift returns notice here everything's kind of squished up okay um and again that's because it has it's it's been slightly normalized but let's also do this on what did i call that uh the percent change as well so you guys can see this as well so notice they're they're almost identical okay it has been it has been squished a little bit okay um so this is this is just something that you guys can take into account this is how you can normalize data how you can clean the data how you can look at your daily returns on the data we can even do something like cumulative distributions accumulative daily returns that type of stuff as well now let me see what else i have that we can do something that's kind of cool thank you everyone for watching if you like this please comment subscribe and click that like button we'll see you guys next time bye
Info
Channel: Data Science for Everyone
Views: 9,802
Rating: undefined out of 5
Keywords: python for finance, financial data with python, time series, yfinance, python, python programming, financial markets, data mining, data cleaning, data science, data science for business, data science for everyone, data science for finance
Id: 7wAQCwdvqqo
Channel Id: undefined
Length: 34min 8sec (2048 seconds)
Published: Fri Mar 12 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.