You Should Use CSS Selectors for Web Scraping.

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hi everyone welcome john here and in today's video we're going to be talking about using css selectors and now specifically css selectors using the request html uh package in python which is like an all-in-one for web scraping it's really useful really powerful um what i'm going to cover are some of the more common css selectors that you'll use that you'll definitely want to know about and hopefully a couple that you haven't seen or used before as well and hopefully make your lives easier when you're trying to scrape specific parts of a website so on the left hand side i have my code written now i have the request html that i'm importing this is the website that we're scraping books to scrape.com which is a scraping sandbox for us to use i'm going to write my css selectors in this variable and then i'm going to find and print out the elements there as they come so the first thing we want to do is know how to select specific tags so if we look on the right hand side here and when i want bigger we have the title tag here and we would just simply to the word title there is hopefully only one of those so when i run that we can get back the element of the title now i'm not going to go into getting the text out of these or doing whatever because that's comes after this this is basically just selecting the elements that you want so once you've got them you can then do with what you need to with them so that's how we could do that and that would work with any html tags if i do div we're going to get a really big long list there it is there so the next thing that we would want to do is we would want to be more specific so we can see here that the body tag here has an id of default and a class of default as well now to get an id we simply use the hashtag and then we type the id and i'm going to return that and we're going to get one element back there we can do the same with a class to use a class we need a decimal point there a full stop and then the class name and we run that we're going to get the same element back now it's important to note that doing it this way we are matching any uh tag that has the class of default on this website there is only one which is why we are getting our list of one back to be more specific and only get body tags that match default we can go and type body dot default which is basically saying here's our body tag and we're going for the class of default so we run that and we're going to get exactly the same one back again so what we can do in the inspect element tool is you can actually right click go to copy and copy selector in chrome and if i paste this in here we can see that it gives us the hashtag default because it's the id is the first one and we can find that run there and we get that back again there so that's all well and good but now we need to be a bit more specific because we don't just want all of the information under here we want certain bits now if i look at this header tag here we can see that this class has a space in so if i copy that and i go dot for class and leave a space we can see that leaving the space in there it's not working it's not matching we can close the space up with another dot though which will basically say it'll match any class that has this word and this word in it so we can see that we have a returned our element there with our header another thing that we may want to do is we might want to find specific tags that are children of others so what i'm going to do is i'm going to just quickly go and find the main bit with all of the information in one here and this the section tag so i've got a section here that has all of the products in it so let's say we wanted to find all of the products individually and here they are so they're all in these list items here and we can see that we have this long class name now this is quite cool this is quite common and quite specific to uh certain frameworks web frameworks like bootstrap etc which you'll quite often see so if i copy this and i put this in here and it's a class i'm going to leave the dot at the front just like the one we did before we're not actually going to find anything there so we could close all of these up with dots like this and run that and that's going to return all of these elements because we are matching across all of these now that works however sometimes these will be slightly different or the numbers at the end will be different and you won't be able to do this or this won't work particularly well for you but what you can do is we can see that if i close this ol this ordered list here and make it bigger all of the list items that we want are in this now this is quite a common thing you might have ol or more commonly it's usually ul for an unordered list so if i do ol space l i as my selector what that's going to do is it's going to find all of the list item tags that are in this ol tag here so if i run that that is an easier way to get all of these elements it's quicker and easier and it's more reliable and if this had a class in it as well which it does we can do row and if i run that again we'll get exactly the same elements back again so we can be more specific and then find all of the tags inside it this one's probably the most common that you're going to use because you're going to want to find all of these list items that are the products or it might be a div etc whatever the website is so if we take this and close this one up and we have a look at this div here this div has no class or id or anything but what we can do is we can because we know we can find this uh ordered list class of row here which we did just before is that we could then say let's just get the next element so we want to find the one that's immediately after so we can use a plus symbol here so we'll do ol dot row plus and then the div and what we're saying here is that we want to find the div element that is immediately after our ol element with the class of row so if we run that we can see we get this element back here even though it has no class or no id now this is really useful sometimes there are information in here that we could want and if i go ahead and copy the css selector of this it's going to look something a bit like this because it's trying to find all the way down now this probably will work but you can see that it's starting in the hole the default which was the whole body and it's finding it that way if i run that it gets it but what's easier to do that one or just to do that so we can see that that's very useful for finding elements that have no class or id or any kind of attribute the next thing that we might want to do is we might want to find a specific element based on its attribute so if we have a look at this one here we have a div with a class of alert alert warning but it's got this role attribute now let's say that this doesn't it this didn't exist but we want to get this any uh element that has the role attribute of alert so what we can do is we would just do we know it's a div and we would use the brackets and we would say it would be roll of equal to alert and i've missed that there and we'll run that and we can see that we have our element here this would also work if we didn't know if it wasn't a div and we will or we wanted to match the attribute across multiple tags and that would bring the same back there so by putting the brackets here we get to interrogate the attributes so let's find another one that has attributes that we can do work with there must be some more somewhere there's another one here is a button and it's got a type of submit so we could find any that matches a type of submit and we can see that we got quite a lot back and that is because i believe every add to basket button which is what this is is a button that has a submit as a type so that's quite a cool way of finding all those elements that you might be interested in and again we could put button in front of here if we were specifically only after buttons as opposed to anything else that might have us type of that word that we were looking for another thing we could do is let's say let's say we want to get all of the links if i go back close out of this let's say we want to get all of the links which are here so there's an as an a tag with our link so let's say we want to get all of the links that have an href so we can quickly and easily do a for the link tags and just type the word href and that's going to return every link on the page that has that href that we can use we can see it there there it is right now so what about if we wanted to get all of the images well on this specific uh page you could do class of thumbnail so you would just do like we did at the beginning img thumbnail because it has a class and that will return them all if you spell thumbnail right it will return them all there we go but you can also have a look inside a different way and you can say we want to have within our image tags we want to make sure we only get ones that have a src the source of the image and we will go ahead and get all of these back again here so you can find all of the images we could combine some of the two and we can say well we only want to get the images that are specific to these products and we know that under the a tag which is the first part of the product doesn't the next thing down is the image so we can just say a and then image src for the source and we will just get all of the ones out of here as opposed to any other images that may or may not be on the page so that's quite useful as well to know so that's if this one guys hopefully you've enjoyed it and some basic css selectors hopefully you've got some value out of this and maybe i've shown you some selectives that you didn't know of course my favorite one is the one where we put a space in between two elements and we find all of the elements that are inside that specific one which is really useful for getting products and product information out so thank you for watching guys and i will see you in the next one goodbye
Info
Channel: John Watson Rooney
Views: 3,259
Rating: 5 out of 5
Keywords: css selectors tutorial, css selectors explained, css selector attribute, css class selector, descendant selector css, python css selector example, css id selector in html, css multiple class selector, css selector python, css selector tutorial, css selector tricks, web scraping css selector, python web scraping, requests-html, web scraping tutorial
Id: hkDAW7hhEYU
Channel Id: undefined
Length: 10min 27sec (627 seconds)
Published: Wed Jan 20 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.