WEB SCRAPING USING PYTHON | WEB SCRAPING IN PYTHON | SNAPDEAL SCRAPING | iMAGE URL EXTRACTION

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
[Music] hello guys welcome to total technology this is protein back with another video and in today's topic we will discuss about lip scrapping and our knock today's website is snapdeal.com it is a familiar website in India where user can buy anything like mobiles clocks anything you can it is similarly like Amazon and Flipkart but in ranking you can say it comes on after Amazon flip cut then snap did so let us open the website first and inspect will inspect the elements which we will scrap and as you can see here is your main website and its section with training products and other things it is showing like stickers all the things are available here we are here we are not promoting the website so today let's say we will scrap go to all of us main freshens and say jeans ok so yes so you can see the URL here it is NAPLAN comm products slash man appearance - genes and queries sorts according to popularity so we use this and basically we will use selenium no not selenium here we don't need to open the website we can do that but not in today's video we in today's video we will use only beautifulsoup basic coding we did so first thing first let us open our visual studio code here you can use any ideally whatever you want like pycharm and other things whatever you want and let us first create a new file say snap dot pi ok sorry for the disturbance so first thing first we will first import the necessary library like import request through it we will request the URL and then in [Music] basically will not input either here because we don't need it well so you can use is that as a CSV or other things like that but basically we believe now import ps4s beautifulsoup before that if you don't have it installed beautifulsoup you can install it by just like command here PHP installed beautiful sooo beautiful soup + x-l XML also call its formatting then again we will import LX amen and then okay so basically we do then say suppose response is response is a variable we are where we are getting doing the request request not gate and your URL here and our today's you are let's say this one okay so this is the URL and after that you can again convert it into taste for parsing responses is is equal to response dot txt sorry txt okay now to parse the data in a lexeme and we need to convert it in true beautiful soup so our data is called - this one's not beautiful sorry B s4 dot beautiful su response and personally L XML okay now let's print the data and see whether it's coming or not after passing it to XML so we will now run this code okay I think we need it in this format this again run this code yeah he is giving I think okay it has given our output so let us remove this and now our thing what we have to do is inspect the elements and find what we need so basically we need the name price actual price and let's say the picture URL for that right click and inspect in Google Chrome okay let's change the position it's better now I guess so far let us say for this text it is Rhonda product title as you can see and it is under product description rating so let us copy this we need to select this so did select we are going to select this so Nate name it has read and let us find the length of the read how many element is coming here it has run this code okay so it is giving us 20 means 20 numbers of product it has found very good our main objective is to extract the as we have said the title so if you see here the title is in under product title so we need to select this t's under P class so you can use I guess like this read it turn it okay we will iterate through this list and so for doing that we simply define one for I read and our let's say product name product name is is equal to i dot find all or you can also use I door select and we are going to select under the product item okay so let's print product name and see whether it is listing the product or not you know it is not listing sorry we need to through this dot here for selection I guess just run the code and see whether it is selecting yes it is selecting the product I tell all the name are coming here okay that's great so we need to find the text file from it so you can say product name dot text here and get text is the greatest function you can get the name here so product name is equal to and let us again print this and say whether it is printing only the name or not yes it is okay that's fine now again what we need here the second thing is our let's say the strikeout price so two four nine nine and if you see it is under span class I float product description price strike so basically it is on the same product description rating it is under the same product description under thirteen so that is why we are going to use this read because we already have selected this one so we'll type this as original price is equal to I dot find all and under find all as you can see it is under span class so we can define it as span and then we need to select this plus name so okay and similarly we will use this get text here so original price is equal to origin original price dot get text let's again bring the original price and see whether it is printing original price or not okay yeah fantastic it is printing the original price here so similarly we will then again find out the discounted price so the discounted price is here similarly under Spain class so the thing is you not need to know how to select these things the only thing in we're scrapping and day by day you practice like otherwise you will be expert in that even I have very much difficulties to in starting they're very much difficult to select these all these things but practice makes sense so discounted price is equal to I dot sorry find all again and in find all is a span and understand it is I've not pull that price so lets us again type discounted price is equal to this contact price is not least so we need the text only so get text function then print the discounted price and see whether it is printing the discounted price or not yes it is printing okay so as you can see we have accepted all the things here now let us put these all the things in a array support that we can use say product name is multi name is equal to a plan carry then original copy original price another one discounted price another one and will happen this so name dot obtained serpent will be here product name okay then again hope the dot happened and happened here will be original price again DP dot append and your hello will be discounted price okay okay for now the rest of the part we need to extract the images for doing that as you can see image is under picture element these another class where the image is here so for that we need another read say read one and we will do data dot select and we are going to select the picture element here and let us mean the length read one and see whether it is as 20 or not it's the same so it has selected that all the pictures and now we need to extract that from here so how to extract it again we will go to here and if you can see where is the image who here select this one it is under SLC and under image class so basically what we will do we will get another for loop say for J in three two one here we do image is equal to or you can say I and G is equal to J dot find all and we will find what IMG here under IMG it is under SRC so for that if he's give real like kama and SOC let's check whether it is finding or not data - yes I see if it is not working then we will use gate method let us print IMG here no it is not selecting so let us print I am Jen see what it is primitive okay so it has selected I am moody and it is class ease product image lazy load but as you can see its data is RC here the attribute of the image class so what we will do you we will use another for loop here for sir k in IMG image is equal to K dot IMG sorry k dot get and here guitar - SRC and then we if we pray in this I guess then it will give the image URL properly great so sorry I printed IMG instead of image that is why tittering at least here so here you can see the images properly extracted and it is showing also similarly we will create one product image is equal to empty area here and then append here product image is dot happened and will happend image okay now it's time to print the inter thing or you can save it on a CSV for now we will just print entity as you have seen it is got 20 elements so we can say you can run another for loop here for say 4k in inch of length off whatever he did one whatever you want to consider because the lengths are same here so after that you can just print your output is equal to here through that name key + say you want a space or not okay put up name + then your product now discounted price no let's say original price then it's discounted price then its product image so let us print output here and this should give you the entire script material here okay can only concatenate string non man okay there is some output I guess which come as none because is this one because this is the thing some of the element didn't come because some of us a data is attribute for that you can use string here string product image let us run this code again there is some mirror I guess it is coming get us one again while this type of output is coming I don't know [Music] okay let us remove the plane from here okay I am an mistaken name of P and D P should be used that is why it is coming like this name o P DP plus string image this K let us run this code again hope this time it will think yeah it is showing the name first this is the main file and this is the sorry again I have committed one mistake this will be product image as we have mentioned here or rather we use P I this will be better [Music] tá also here and here also P I let us run this code again this time it should come with the URL yes so this is the jeans name its price in Indian rupees original price then discounted price and then link and for your betterment you can use ones space here so papi this let us run again what happened you use here okay here it is coming the name and then the hits original price then is discounted price and then the image you are in so that is guy that's it guys hope you have enjoyed the video and learned something today if you liked this video don't forget to click the day like on and subscribe to us and to share our video have a great [Music] [Music]
Info
Channel: Total Technology
Views: 785
Rating: 5 out of 5
Keywords: web scrap using python, web scraping in python, web scraping image extraction, snapdeal, BEAUTIFULSOUP
Id: qw0F2NSu7EY
Channel Id: undefined
Length: 32min 10sec (1930 seconds)
Published: Fri Aug 23 2019
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.