Put This In All Your Python Projects Now

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
okay so if you've started to write more complicated web scrapers you're going to want to know more about what's happening you've got lots of functions maybe you've got classes you need to know what's happening when now one way you could do that is you could use print statements so on my screen here this is some code that scrapes and products from amazon so i'm going to run this and you'll see what we get out because i have a print product statement here within my function so you can see we're getting these two pulls out here with the bits of information that i've scraped down now that's all well and good but let's say we were just run this maybe on our web server or our raspberry pi or something like that and we want to be able to actually keep a proper log of what's happening where and when so we can see you might find that you could stick some more print statements in so let's say we could put it in here we could go print r dot status code let's see that one maybe yes so we can see that we're getting a good status code back that would be handy so okay let's run that now again we're going to get so we'll get some 200s that's useful we could just output this to a file in linux you can do that really easily you can also do in windows you can just output it to a file and save all that information but there is a much better way we can use logging within python to do that for us and the good thing about that is that it will automatically log all of the requests that we're making to the server so it's really really useful so to do that we're going to do import logging and we just need to do some basic configuration so up here i'm going to do logging dot basic config now this is just kind of scratching the surface of logging but i think you'll find this really useful this is basically just putting in a few extra lines into your code and you'll get a good log file at the end so the next thing we want to do is we want to specialize specify a file name so i'm just going to call this scraper.log and then we want to add in our level and i'm going to do logging dot debug now there are different uh logging levels you could put in debug is like i think the lowest one and that will give you all the information that you need so go ahead and put logan level debug in you can play around with it you can use info or warning if you're dealing with that sort of information but for now just copy this and you'll see where we get to so now we want to format the actual message so when we log this we want to have the right bits of information in there now the most important one i find is always just to have the date all the time as well so you can see when that thing happened if you think you're going to run this code loads of times on a server every day or every hour you want to know which time it failed so what we want to do is we want to do format it's equal to and i'm going to put this on a new line down here format is equal to and we want to do the percentage sign here and i'm going to do asc time like this that's just going to give us the actual time then we do s a dash and then another one and then we're going to add in the message excuse me like this so all this is doing is it's giving us the time and then we're saying then we're going to put a dash and then we're going to have our message we need the s at the end there as well now after this we want to actually specify the format of the time that we're going to get so we can do i think it's date fmt there we go now if you've ever used um string to time or string format time in uh with the date time module you'll be familiar with this it's basically the same thing so i'm going to do percent sign and then d for the day and then we'll do a b and then the year there we go so b is going to give us the it's like the short term name code so like uh september will be s-e-p-t as opposed to the number so either if you want to prefer i'm going to put the time in as well so we're going to have hour and then the little separator minute of our separator and seconds there we go we've got there in the end okay so now that we've done this we're going to basically automatically log all of the requests that we make to the server in this scraper.log file so i'm going to run this and we're going to get out all of the output again you can see i've noticed i've taken away my print statement so we're not going to see anything come to the screen we're going to let it run and hopefully it'll finish nice and quick and then we'll go and have a look at the files okay so that's finished if we come back over to our file we have a scraper.log file now this is where it's really useful we see how we formatted it so we had our it's sap not se sap not sept so we've got 9th september 2021 that's the time that we put in so over here this is our formatting that we did here let's just remove this terminal so we can see so that's our time that we put in here and our message which is starting the new connection okay so we started a new connection and then we got here's where we tried to connect to so we went to this url here that's the uh the type and there's our 200 response so this is really really useful if you're trying to actually capture the data of what requests are going where and this is why i would recommend especially for people like i started doing lots of web scraping that you actually implement logging like this into all of your web scrapers that you start to use now to make this one step more useful all i'm going to do is i'm going to add in a quick log here so we're going to do logging dot info because we're going to add this in and we're just going to say product here so what's going to happen now when i run this again is that every time we loop through we run our function it's going to log the output of this product tubal as well so we're going to see the actual data that comes back within our log file so there we go let's remove this and we can see now we run it again here we go here's our new time here's our new one and obviously when you do run a log file it always appends to the file so you don't have to worry about uh having your files get over it and every time it runs we can see here you go here's the information that came back from this request here and this is why i think you should definitely put logging into your web scraping files if you're interested in more web scraping stuff go ahead and check out this video right here
Info
Channel: John Watson Rooney
Views: 2,031
Rating: 5 out of 5
Keywords: web scraping, python web scraping, web scrapping, python logging, simple logging, code tutorial, web scraper logs
Id: fYmQLv16-44
Channel Id: undefined
Length: 6min 13sec (373 seconds)
Published: Thu Sep 09 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.