Scraping Historical Cryptocurrency Prices [Python Beautiful Soup & Selenium Tutorial]

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hi guys this is Alice from streiko today I'm going to show you how to scrape historical data for Bitcoin this tutorials beginner-friendly and I'm going to be showing you two methods side by side one is with requests and beautifulsoup the other one is with XPath and selenium beautifulsoup a little bit more friendly however some of the website may not let you do that in that case you can use selenium and XPath to get the data anyway so what I'm going to go through today the same technique can be applied to any other website you would like to use to get your data from now let's get started I will be showing you these methods you're using coin market Capcom as our example when you first come to this page you notice there is a time option and I simply selected all time and you can choose the range of dates you want to have and today I'm going to show you how to get the date you can use to get open closed or any other category you like to get as well I will show you how to export that data to excel file and there's many other ways you can export your data but I'm just going to show you how to make it into a CSV file CSV file so first you want to open up your Python and I'm using spider and Python 3.6 I will also put the link to download that in description right here I'm first going to show you how to use request and beautifulsoup to get these data I like to put them into lists so I will have a list of all the dates I'll have a list of all the high and you know anything else you want so to do that you just simply create a list and I will call it dates list everything kindness and our pants Wow alright okay so to import requests but also imports beautiful soup okay so the next thing you want to do is to have a quest to get the page right here and to do that you type in our equals to request that gets and the string the URL of the page you wanted to get just copy paste it here and you can just run that okay and now you can look at our response 200 means everything's good the next thing is to create your switches to equals to be beautiful so the beautiful soup is a parser so you wanted to help use it to help you organize your HTML that you'd get front you can I'm pretty sure work right here but just in case so if you run that it takes a second but then again if you check these typing soup you should be able to get something like this if you look back at this page let's just try to get a date first HTML so you'll right click at the first date and go to the very bottom clicking specs you can see how the HTML structure here and if you notice when you hover your mouse on HTML script element that's representing will be highlighted okay so now I know that the first is a TD element in this table so I'm going to try to get that to do that I would do soup define so I tell it to go through this this mumbo-jumbo here and to find the very first thing that that's called TD and I got April 22nd 2019 and that is the for settlement the see if we can find this first element here so this is a tr element and the class you do brackets with a string of the class name which is class here and you do this and text rights and a comma okay so you found all of these now the first thing if you take a look here is the dates the second one is the open and then you have the high the low so on so to go forward I just want the date what I do is to tell tell it to not only find this and I wanted to keep finding TD here and the class of that is bracket plus if you look over here it's text left missing a print oh I'm missing a bracket okay so now you got date but you also got TD class text left so to get rid of that you can just say that's hex know now you know how to get just the dates okay so from here a good way to go is a little loop so I'm going to teach you a quick trick here it's a little cool algorithm so here it is I'm going to loop through all of these TR element so if you take a look that all exactly the same so I kind of discovered that when I was looking at a website what I want to do is to go through all of these and then I'm going to grab each one of the element that has the Datanet inside of this here I'll call it TR equals 2 we already found that earlier which is so it's the TR class equal text rights and instead of that fine I won't find all of them so I'm going to do that fine all and that will give me everything and I'm going to say or item in TR so right now at this point he R is a list of all of these elements here so for each item inside of this list I'm going to do I don't um defined the second part they find this now let me I find that I mean Apprendi and not everything to show you what it looks like well that's a bunch all right I go all the way to April 28th and you can see I have successfully gotten all of the dates ok so from here instead of printing them which is kind of pointless to me I mean I say dateless which is what I creative with dates that append this item this this date so if I just do this what will happen I'm going through the list of TR again and I'm going through each one of them I'm going to get the date from each one of these element append them into my date list so if I just run that check it out I have catched 2180 six elements if you open that these are everything in your date list now pretty cool okay so we have successfully gotten the date and what I'm going to do is to get high do that oh here's another neat trick so if you can take a look of these data here this doesn't work if it's not like this there's no missing data and I think that's probably correct but this doesn't work if you're expecting one day off like the market completely closed or something like that and therefore I'm going to use this other neat trick so then again I have this right and I'm gonna get the first items instead of going in like I did earlier to find the class name and the text left I'm going to just say that finds all of the TT and that's going to give me everything in there and I'm gonna say I'm pretty positive the second element in here is going to be what I want right again in Python 0 1 2 so this is 2 so it's my second element and that texts and you get the 5400 this is a another cool trick and I'm going to put it here I'm gonna say hi lists append item by this and then you can probably use this same way to get whatever have anything else I'm gonna just do market cap right now count 0 1 2 3 4 5 6 [Music] okay so let's just run this and test it out okay so you have everything just well just because we've grabbed it in terms of order and we don't have any other try and except savings here or anything like that you have to make sure all the lists up in the same length okay so to do these things in selenium first thing you want to import selenium from an import and webdriver if I can still okay good and then driver you're gonna say you should like to put the imports on top and driver equals to webdriver that from and you're going to give it a path of wherever your chrome driver is at and I'll put the link to download your chrome driver in the description we though we have this and probably I didn't catch that so I'm gonna show you again if you type in this with your path you will see a chromedriver opening the browser okay and then driver don't get same thing you want to tell it which URL which is this one and go to that page selenium is a testing software so we're doing this you will be able to see kind of like what what it is doing instead of like we're using beautifulsoup you don't get to see any of that you can turn it off it's a little bit slower but it does give you a lot of more flexibility okay so now I have this page I'm going to go for the date again I'm just gonna show you the date and it's the same way you would you use to get everything else and then I'm going to show you how to do the CSV part okay so one thing is that it's not going to let you do anything else until the page is loaded and just so happen when you select the date of all times websites a little bit slower so like I said beautiful soup is way faster right you know if you try it out on other websites you might realize some website doesn't let you get the data like that they prevent you see how you remember like the very beginning this has 202 response that was a good news and not all of the website is gonna give you that and there might be some trial and errors but you know you will have a lot less of that with selenium just when the load that it may not be focused on your element you just do some thinking right clicking specs now you have driver define elements or element by XPath so if you do element is kind of like when we do beautiful soup soup define and when you do elements it is like when you do beautiful soup define all so it's the same thing you can get one thing or multiple things at a time and we'll remember that every single time were you getting the one element you're only getting the very first one so it's a little bit different in selenium you can also do driver Delphi element by tags or you can tell it to you a lot of different things interact with a page by using Alert another building functions so right here I'm just gonna show you how to do it with XPath so the first thing is to type a slash and the slash just means to go through everything like the element could be located in the middle of the page all right so I would do slash slash I'm sorry slash slash like this say again I'm gonna say teeth TD class equal to text left or I can try to fight this TR and bracket now at class equals to double parenthesis of you know the class name which is text right over here I'm gonna show you you know yes selenium has find this web element here and you can't really do anything with it until you say dot X okay so now you get a bunch of stuff there's many ways to go from this point you can tell it to go one more / 1 / means go to the next child element so I will also put a documentation for using XPath in the description so if you want to learn more about that but one slash just means to go down one child element chowman is just something underneath that and I will say go down one and find TD oops it's right here sorry so the first element I found the dates now okay so I'm going to show you what happen if I do not find elements and you cannot do that text anymore because they are lists again you can't get a text up unless so it's going through and getting old data right now now it's finally here and some way we did this cool little trick you can also use selenium to do it so right now I'm going to show you very quickly how to put all the data you have got into a CSV file you okay so this is just kind of what I do and there's a lot of way to go about it I'm just going to show you something and by the way you can also download this complete Python file on my website as straight code that net slash learn and just go I will put the URL in the description so you can if you want download this file and practice it you know try out yourself so right now I'm going to show you how that is done I usually have a little function but I'm just going to show you you know line by line right now so I I like to create the first role on the very top because in cell usually the first role is actually the title right so this is just a little thing to help me so I'm gonna say the first one is dates right and the second one is I and the third is market cap then next I'm gonna say roles equal to zip so I'm going to zip up everything here so I'm gonna have roles equal to zip of dateless Eilis capitalists hands and turn I type date on this okay and then from here with open and I will give it a file name I'm gonna call this point markets example dot CSV okay and you can tell it to do what right which is doubly and you just have to put in these things and you're going to tell it to every single time comma newline equals to this is my CSV file um say links writer equals CSB writer my CSV file and I miss a links writer that writes role I'm right role 0 so sometimes I put them separately because sometimes if you have nothing to list you can still see the title then I would say for item or anything in roles I'm gonna do links writer that right role of item and that's its oh right you have to import CSV then you should have it you can just search your file name you should see your CSV file okay so that was very beautiful so I have all of the data in CSV now paint and one thing happy very carefully site but you have to make sure oops these are for my other script all of these lists are in the same length so then just to just make sure that all in the same wing then you have something like this so that's our end results and that worked very well just to complete this you know you can grab all of these and put it in here and make sure you import CSV and roll 0 and when you run the script you would just ultimately get the CSV updater if you look again I have date high and market cap and that is what I typed here for roll 0 and this is very helpful when you have a lot of things piling up another great thing about this is so if you're trying to get data that's constantly up dating right like this so a day after or two days after every single time I come here to my script I just press play it's gonna go to that page get all of data updated and create the same update my CSV file so every day instead of copying pasting or anything like that you just automated this whole step and you can run it as many times you need to some reason you don't want it to update your currency asleep the next time you run it just change the name a new file will be created okay and also I forgot to put this here and don't forget if you won't download this I will put a link in the description thank you guys so much for watching today like and subscribe if you enjoyed this video [Music] you [Music]
Info
Channel: straight_code
Views: 8,028
Rating: 4.9607844 out of 5
Keywords: python, Cryptocurrency, csv, beautifulsoup, selenium, historical data, scrape, finance, algotrading, automation, web scraping, python tutorial, scraping, python finance
Id: XyyMjKOqyOk
Channel Id: undefined
Length: 21min 42sec (1302 seconds)
Published: Thu Apr 25 2019
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.