How to Scrape Stock Prices from Yahoo Finance with Python

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
in this video i'm going to show you how you can create your own stock price scraper using python we're going to be scraping yahoo finance we're going to be using functions and we're going to output everything to a json file at the end some cool stuff in this one so hopefully you guys enjoy it let's get right into the code so here's the website this is yahoo finance and i've actually gone and i've just selected this stock price um what we're going to be doing is we're going to be scraping each individual stock page um which is one of what we're looking at now for the this data here so the the closing price and the change and we're also going to be doing uh collecting the symbol which is this here so if we have a look at the url we can see that it has the main part of the url slash quote and then slash and then what appears to be the symbol for the stock there's a little bit afterwards but if i actually just remove that and load the page up again we still get the same page so that's a really good indicator for me that we can easily manipulate the url to get the pages that we're after so if i change this and we do this hopefully we get okay so i've mis missed some but if i click on this one we can see there's another one and the url is exactly the same format so i'm going to do is i'm going to go ahead and i'm going to hit view page source and what i'm going to do is i'm going to copy some specific data that's on this page and search within the source for it this is a good way of finding out whether you can use requests in beautiful soup and just pass the html when you're web scraping if it doesn't appear in the source it's probably loaded dynamically by javascript or something similar so i'm just going to copy this line of text we go to the source again and search we can see that it's popping up here i mean this is quite convoluted and and pretty um hard to read but it is here so this suggests to me that we can actually get the information out the way that we want to the easiest with beautiful sweeping requests i'm going to copy the url and we're going to head over to our code and what i'm going to do is i'm just going to import the libraries that we need first so we're going to import requests and then from bs4 we're going to import beautiful soup still can't spell that if you don't have either of these installed just go ahead and pip install them pip install um requests and bs4 so that will get you those libraries installed if you need them to start working we're going to set our url to this one and we're going to chop the end off like we did before and then we're going to start actually querying the website and the server to get the data back from this specific web page so r for response is equal to requests.get we're doing a get method onto the website and then we're going to give it the url that we've got here so we're using requests here to get and this is going to go ahead and bring that information back down from the website i find it's always good practice to make some custom headers so i'm going to say headers is equal to and we are going to specify our own user agent like this user dash agent and then the value will be whatever our user agent is i'll just go ahead to google and do my user agent and we're going to copy this string here and we can go back to our code and i'm going to put that in there it's quite long it's going to go off into the next page but we don't really need to see what that is so to check that that's working you might notice i forgot to add headers is equal to headers after the url at myrequest.get but don't worry i add it in later i do realize eventually i'm going to print r dot status code and i'm going to run that and hopefully down here we get a 200 error except i didn't spell beautiful soup right i've missed the u so don't be like me and check okay you can just see down in the bottom corner here we did get a 200 so we know that that's working another check to do is r text so it's going to bring back all of the text from this page and there's going to be loads of it but you can sometimes just go through and you can sort of you'll start to identify a few parts so this looks like a lot of information to me so i'm going to say that what we're after is going to be there if you're going to get hit by captures and stuff like that you should be able to see it in here let's move this out the way uh now we need to create our soup variable and we're going to do soup is equal to beautiful soup and we're going to give it r.text and i'm going to say we're using html.parser there are the ones you can use i just tend to use this one for no real reason other than i use it now to check that this is working i like to do print soup dot title dot text so what that's going to do is it's going to search within the soup which is our html it's going to go through all those html tags to find the title tag and it's going to give us the text of that element this is another good test to make sure that it will work and it does so we've got back exactly what we're expecting the stock price for this specific one that we were looking at aspl so we know that that's all good so now we can start actually querying the specific parts of the website that we're after so we want the stock price and the closed closed data and the change so if we go back to the website and we click on the website and go to inspect let's make this bigger so we can all see and now if i click on this little tool over here we can see that there is two bits of data that we are specifically after so i'm going to click on this one first which is the closing price and we can see over here it's in a span with a class of all of this random class data now normally this puts me off but if we go ahead and copy that and go back to our code we can say price is equal to soup dot find because we want to search on within that html and it was a span tag so we put span in comma and then we're going to search with a key of class because it was a class tag uh sorry class attribute and then we're going to paste that in there and i'm just going to print out this element and see what we get back and see if it's going to work right so we got this back here so we got the element back and inside it we can see just here that's the closing price so now we can do is we can put dot text on the end and that is going to get us just the text from that element which is right there so we know that that's working so the next part of the piece of information we wanted was the change data which is this bit right here um this one didn't appear to actually do anything so i'm going to go ahead and we'll click on it anyway and then we'll check some other stock before we before we move on so same deal with a span with a class so let's copy that come back here and do also call it change again soup.finds and it was a span tag and again the key we're searching for is class and the attribute this one that we copied dot text so now i'm going to do print price and change let's print then let's run that and we can see we've got our two pieces of data right down here so to check that this works across a few other stock symbols what i'm going to do is i'm just going to find another one so we're going to go market data and just grab some if you know them off the top of the value off the top of your head go ahead and type them in i don't so we're gonna try prem.l and velar okay great so let's put that in here perm so let's run that and see if we get some different data so as you can see this has not found this element so what that suggests to me is that that part of the name this element changes so if i go ahead we just wanted to say it couldn't find change so let's copy let's comment that one out and let's see if the other span element works for different ones as well okay that does but this change one didn't so we need to find another solution to that it probably means it probably means that this changes um depending on the color so if you look at this we can see that the class here has got negative color in so i'm guessing that makes it go red when it is down so if i copy the class from this one and we just paste it underneath we can see that although it's very similar so this was the original one it's got this bit at the end and it's not finding it so what we can do is we can either we can try shortening it and just finding maybe the first part so let's do that uncomment that and go to change again to print out and let's get rid of the end of this and we'll take it right down to there so maybe we'll get a partial match okay still don't so what we're going to do is we need to find another way around this what i like to do in that case is go back up the html tag chain so we can see we're right down here zooming in right on these uh span tags but if we go one up we can see there's a div with a class here that we could try and use and that's got all the information in it we can see that's still highlighted so if we copy that we can then find this div and if we can try and find this more reliably we can then just reference the span tags underneath by indexing okay so i'll show you what i mean in just a second but i'm going to copy this and i'm just going to have a look on the other page so we're going to see it's that and if we go to back to the other thing here the other the other price we can come back over here and if we get this one again see it's exactly the same class it's got this text so that suggests to me that this class is going to be the constant whereas these aren't because of the coloring that they're adding to the class so what we can do is we can copy this if we go back to our code we're going to delete these because these don't work across multiple multiple pages so we can actually say price is equal to and we can do soup dot find and then we can find the div and we can say this one that we know appears we think appears to be constant we can do that and then after this we can say after we've found that tag we want to do find all on whatever's in that tag and we're going to say it's the span because it's always a span tag we found that out and if we go for the very very first one so you can see i'll show you there's two so in this div element there's two spans so we want to always reference both of these so what we can do is we can just index the first one this is zero index that'll be the first one and we do dot text and then if we copy this out and paste it again and go to call this one change and then change this from a zero to a one this line here soup dot finds this this div with this class and then find all span it's important that we use find all because if we only use find it would find the first one and we wouldn't be able to index them find always returns a list so remember that which is how we can which is why we can index it so this one's going to go for this one regardless of whatever class this span tag is so now if i save that let's clear that so it's easier to see and run it we can see that we have got the price and then the change now if we change our url back to um was it this one there we go it's worked so we can see that even though the classes change for those span tags by going up a level in the html and getting this div that doesn't change the class didn't didn't change and then doing the find all on that as well in line and indexing the span tags we can get the data that we're after so let's try some other ones let's go back to the main page um let's see what can we try icon.l and bzt.l okay so let's try those icon if it's not obvious i don't know an awful lot about stock prices by now i don't know what will be there we go that works and bzt so it's looking pretty good so far fantastic so now that i'm confident that this is the way to do it and the way it's going to work we're going to create ourselves a nice function that's going to have all of this data in it that we can call to give it a stock symbol and then for it to return us the price and the change so i'm going to delete this and we are going to go over to above our code here and we're going to def for define for our function and we're going to say get data and then in here we're going to give it the stock symbol and then we're going to indent all of this into our function i'm going to move this up as well and here we're going to create a dictionary we're going to call it stock and we'll say that is equal to and we're going to turn this into a dictionary so let's tame let's turn price and change into the keys get rid of the equals and make it into a colon so that becomes the value put commas after them so we don't get any errors and now if we save that and we do return stock like this i missed that in earlier that's my fault turns as well just going to show we didn't actually need the user agent headers but we'll include them anyway so now we're returning the stock data out we just need to make a quick amend to the url so it puts the new symbol at the end of the url so we're going to turn it into an f string and here in the place of the code at the end i'm just going to put two brackets and i'm going to type the word symbol in here because this is going to be here like this so it's going to put this in there it's going to put it in there and then we're going to get the stock data out there so let's give that a go so if we print get data and we give it this one i believe so this should put that data into the url and we should get that out okay so we did that came out at the bottom there let's just double check another one uh icon and let's see that that one works does so we can see we're getting two different sets of data great so now that we know that that works let's say that there were four or five specific stocks that we were interested in and we just wanted a quick and easy data output of each one at the end of the day say or whenever we're interested as opposed to having to go and open up all the web page and have a look through what we can do is we can create a list of the symbols that we're interested in and we can loop through them and then output the data so i'm going to create a list i'm going to say my stocks and we're going to give it the ones that i can remember to save me having to look them back up again icon um prem that was one and one more that's what's bct okay so there's four there that will do so we're happy that our function is working and it returns this the um the data that we're after so what we can do is we can just loop through this list and then save that to a new list and we'll have a list of dictionaries with all the data in so what we're going to do is we need to create our blank list i'll just do that under here and i'll just say stock data and we'll get our blank list and down here i'm going to collapse the function for now because we're happy that that works what we're going to do is we're going to create a simple for loop and we're going to do 4 item in my stocks and we're going to say stock data dot append to append each one to the list and we'll say get data and item so what this is going to do is it's going to use our get data function that we valid we verified works with the symbol and for every stock symbol in this list that we have we're going to reference it with item and then put that into the function and then append the whole thing to our blank list so if i just put a quick print statement in underneath so we know that something's working and we'll say print getting and then we'll say item in here and then outside of our loop once that's all finished we can do print stock data okay so i'm just going to run that should take a second or two whilst it goes out and gets all of the data okay and we can see that it's returned out the prices and the change for each one of these we can see all the information there great so we know that that works no good being just printed to the screen though so we're going to do some kind of output for this i'm going to use json we're going to output it to a json format so we need to import json into our code this is in the standard python library so no need to pip install anything to do this is nice and simple because we've know that our data is in good shape because we created the dictionaries we can just simply use a context manager to open a new file and then just write everything to it and to do that we use with open as our context manager and we're going to use json.dump to dump the data into that file so we're going to do with open and we're going to give it our file name which i'm going to say we're just going to be called stockdata.json and we need to give it a w sorry for write if you only want if you're opening a file you just want to read it you can just put r in there but we're going to be writing to it so we need a w and then we do as f and then underneath here we do json dot dump and we're going to give it our list which was stock data and then we're going to tell it to save it under the file that we've just opened here which is f and right at the bottom i'll just say print so we know that it's worked so i'm going to run that and hopefully at the end of it we get fin and under our files we get a new json file with the stock data in it so we've got the symbol the price and the change for each data that we have scraped so that's it guys hopefully you've enjoyed that slightly different um output similar sort of approach but i think there's another cool a couple of cool concepts maybe in there that you might find useful to use even if you've done more html scraping before if you haven't done much web scraping please consider subscribing to my channel i've got lots of videos already on the subject there and more to come main videos on wednesdays and sundays i might rotate them around depending on my actual schedule at the moment but on both of those days hopefully there'll be some kind of content for you but certainly one a week so stick around for that and thank you for watching and i'll see you in the next one bye
Info
Channel: John Watson Rooney
Views: 12,352
Rating: 4.9693875 out of 5
Keywords: scrape stock prices python, web scraping financial data, scrape stock data, scrape stock data python, scrape stock market data, how to scrape stock market data, web scraping yahoo finance, yahoo finance python
Id: 7sFCOunKL_Y
Channel Id: undefined
Length: 20min 5sec (1205 seconds)
Published: Sun Oct 25 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.