How To Perform Web Scraping In Python Using Selenium

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hey everyone this is mukesh attorney once again from learn hyperinformation.com today in this video we are going to talk about a very interesting concept called web describing okay so not only will talk about web description we will also talk about how you can export the data into an excel sheet and how you can send us an email as well just like a complete workflow that we are going to automate now okay so first we will scrap the data again i'm taking one example but the same concept you can apply it multiple places okay so in this example what i will do i will search for some product in amazon i will get the results then i will fetch that results okay and we will dump that data into an excel sheet so it is going to create runtime excel sheet for us and that exit sheet we will be sending okay with attachment so this is like a three different things that we are trying to automate here and you know uh it's going to be very easy it is just uh you need to understand how these things work okay so in our previous videos we discussed right how to type how to click how to start browser the second and or i will say the different videos we've seen how to read excellent how to write except and in separate videos we already have seen how to send email with attachment without attachment it is just this video is going to combine all the concept and i will show you how you can use it so this will give you clarity okay how python works in different concepts that we discussed right we have seen list how it works we also talk about the functions and we will be writing a small formula which will simply append the data and uh finally we will be sending email okay so it's like multiple concepts we will be using so if you understand this video so many things will be cleared in your mind and if you talk about the freelancing website right multiple freelancing project you will get of this type like they will always give a kind of assignment where you want to scrap the data and send them at a regular interval so in the upcoming videos we'll also talk about how you can schedule a test so python also have its own scheduler where you can run your automation script based on some time interval okay so let's see how it works and let's jump into the desktop and let's write the code from the scratch so those who are completely new to web scrapping so just to tell you web scrapping is kind of a technique where you will scrap the data from particular website or from a particular product okay and then you will be using the data for some kind of analytics so right now let's say i just want to capture some data from amazon i want to extract the data i want to store into some file and then maybe i can use that file for some analytics or for some other usage okay so in let's say in this case i just want to search for samsung or maybe i can search for iphone so let's say i'm looking for some samsung phones okay and let's say these are the list of uh phones which i'm getting so if you just see we got couple of results right there's some couple of ads as well so what we can do uh we can simply remove these ads so in order to do that what we can do on the left hand side i can only say that i want only the samsung brand so you can see i got their product name i got their prices i got the ratings and everything right now i just want to write a small web scripting script which will extract all the records and when we talk about record i just want their name and the phone number so that i can analyze which record i want or and i can see what is their you know range and everything it's totally up to you so let's do one thing uh first part will be from selenium where will open the browser will capture the details second part will be in excel third part will be sending email so let's do one by one i already have a package created called web scrapping demo so let me create a python file and i will say this is web scrapping amazon guys web scripting concept will remain same okay whether you work with amazon or any other site the things which i'm going to tell you will remain same okay so don't worry you can apply at any application so first of all i need the web driver those who are completely new to this channel um before you start this web scripting make sure you have installed pycharm which i am using right now for writing the python scripts and you also need python 3 installed okay so in case if you don't have i will recommend you to watch this video where we discussed everything about python and python okay so for this example i'm using chrome so i will start chrome and i'm also using chrome driver manager that we discussed where it will automatically take care of the browser's drivers and it will return as the driver second thing i need to start the application but yeah before we start let's maximize and now let's do one thing let's open the application that we want to scrap in our case we want to scrap amazon so i will just type this one amazon dot in and now i just want to set some implicit weight that in case something is not immediately available it should wait minimum sorry maximum 10 second but if it coming before that it should move forward then second thing is i need to enter this samsung phone right so let me just do the same thing again i will go to the home page i will just right click i will just do the inspect and if i see i have this id which is to tab search text box so let's use this so let's write input and i will use xpath now i will try to write what is small xpath where i'm checking input tag which contains id this now if i simply remove this text if i remove this two tab also it should work and yes you can see only one matching node so what we can do uh let's store this or maybe directly we can use here i will say driver.find element i will be using by class okay buy dot xpath and you can see we have written a very smart xpath where it will only search input tag with id search and then i will simply use send keys and i will use samsung phones okay then let's do one thing let's search the same thing i will just type samsung [Music] phones and on the left hand side you can see i'm getting couple of ads so let me click on this check box as well so that i can i will get only the samsung records so you can see i got this text so i can just write that find a span tag because all this text underspend and i'm looking only for a text called samsung and you can see we got this right so the moment i click on this text itself this checkbox is getting selected right so just use this particular xpath to click on that checkbox i will be using same by dot x path and this is the x bar that we are using and i will say dot click now once you click here you will be getting all this right records so let me just take the name if we just expand this so if you notice we have a class here which says a size medium color based text menu right and if i simply expand this i'm getting this name so if i take this class okay and let's search that find a span tag which can okay let's can let's use contains okay because we will simply try to optimize this so as you can see we got 24 matching nodes so it seems like it matched all the records as you can see it highlighted all right perfect let's try to remove some of the okay characters which is not applicable and you can see still records are 24 fine so if i remove this also okay we got 25 it means one additional record is coming so let's try the old xpath because this is giving some additional record i guess which might not fit into a requirement so as you can see the 25th one is this one right which we are doing which we don't want basically so let's stick to this one which is a color base text normal which will match all the elements perfect so now what i will do i will use find elements this time guys because when you work with multiple web elements you need to use find elements that we discussed already so by dot x path and here i will just give this x path which will match with all the phones now this is going to return me list right so i will just store this into phone names so phone names is basically a list guys in python so this will return this we are storing into four names which is a list of web elements same thing we can do for um prices as well so if you just check the first price okay you can see we got a class called a price hole right so we can do the same thing still it says pan tag we will be using class instead of using this class this time let's use this and let's remove this space and you can see we got the prices as well let's remove a also yes so we got this and what if i use this okay so price is matching with multiple so let's stick to price hole okay so the second list will be find elements again by dot xpath and here we will be getting all the which will be let's say prices right now if you just want to see this part whether it's working fine or not so let's run a small for loop where i will just get these phones one by one okay because this is a list this is also list of web elements so let me iterate this list first which should return a list of phones i will store one by one and this is basically guys one web element right so if i want to just get their text i will just say text right so this will return me all the phone numbers sorry phone names and second for loop which i will run which will basically print all the prices so i will say a price in prices so basically it will take the first price it will store into this price web element and then i can again say price dot text so what we did we are iterating first phone numbers sorry phone names i'm making a mistake again and i'm iterating all the prices so basically first phone name and the first price this is how we need to combine right both so i will show you how you can combine but let's run this and it will keep on giving you it will just print after this phone names so i will just do a print okay start into let's say 50 times i want to print the star and once you're done maybe i just want to close it so i will say driver dot quit so it is going to start our session it will search for amazon samsung phones okay so we did a small mistake guys we have to click here as well so for the time being i will click it manually then we will just write the code for this so as you can see we got the prices okay if i show you so these are the list of phones okay so let's cross verify so first phone was samsung galaxy m01 which is this black 3 gb ram 32 gb storage with no cost gma perfect right and if you see the prices also you will get the same prices so if you scroll down you will see this is like it is printing five times sorry 50 times is start and then we're getting the prices fine so let me just add that step that we missed that after typing send keys i just want to click on this go button as well right so i will just do again a quick search maybe i will just go to the home page first because this is how we have to do it and this button we need to click so if you just come down further and if you see this is a very straightforward input tag where value is equal to go so let's use this first of all we will try to find the input tag where value equal to go and we got the matching record so let's do one thing let's quickly add this by dot x path okay and dot click so let me show you one additional thing now let me remove this driver.quit because i want to show you these things once you got this once you got this you captured the bride but we need these prizes in some kind of collection right so that i can put this data into excel csv or any other file so i will be creating two list okay where i will say my phone okay you can use any variable i'm just using my phone and i will say my price you can use scrap data as price and this is how we are creating blank list so when don't give anything just write the bracket this will create a list for you same thing we are doing for price so we have two empty list so what i will do once i will capture the text inst okay after printing or maybe if you want to ignore this printing node once you get this just say my phone dot append whatever text you got okay so you say phone dot text so it is actually printing the same thing it is also appending into this list same thing i want to do with the prices so i will say my prices dot append and price dot text now one thing which you have to notice guys first phone first price will be same right because these are basically two list but right now what we are doing we are just scrapping the data right so if you see the first phone and the first price is same here so what i need i actually created two list so once this for loop is done it will fill the data into these two list now this is the main part guys that we discussed earlier that we have a zip function in python where you can zip two or more than two tables so right now first a table which i have is my phone right so i will say my phone and the second thing which i want to zip is my price so basically this will zip okay these two list and finally i should get the zip object which i will say for the timing final list okay so let me print this okay so what i will do finally this is zip class object i want to convert into list or set then only i can use it so let me run a for loop i will say data in this is a zip class object i need to convert into list so i will just say list and i will pass this final list here okay so this just object we are passing into list and i will get the data into this data variable let me print this data okay once you get the data it's up to you how do you want to play with it so right now whatever we are doing it we are just doing to get the proper data let's enter samsung phones hit enter it clicked on the samsung check box as you can see and we got that result as well guys so if you just see perfect so this is the first record okay with the price sorry this is the name this is the price name price and we got this is list of it tables or i will say list of records that we got and this is done by zip so this is one list this is one list when we say zip it is iterating multiple tuples to you right so this is one tuple this is one tuple so basically we got list of tuples or tuples now it's up to you how do you want to use it so if you want to again cross verify samsung galaxy m01 price is 74999 so this is the session which is running and you can see seven for double name second is m21 midnight blue which is this one price is one three triple line which is this and you can just validate the last records as well okay so now let's store this data into an excel sheet okay for this we are using open pi excel that also we have created a dedicated tutorial on this which i will link here uh in the description so just go ahead and watch so basically i'm going to use workbook class okay so we will be creating object of this workbook let me say this wb so once you get this wb object let's take the active sheet i will just say active sheet which is definitely sheet1 let me store this into sh1 now this is my sheet this is my workbook and this is my data right this is my final data now how do i use it so let me run a small for loop okay so this for loop will simply run multiple times depend on how many records i have in this list so i will say for x y z or abc or i okay let's use x in definitely i need this list which is list of records right and every time whatever record i will be getting i need to append into this particular sheet so if i say sh1 dot append there's one predefined method available in open pi excel so when i say append what we need to append this x that's all so once this for loop is done it will append all the records that we have abstracted from the web and if you just want to save this file just say wb which is just this object and say dot save and provide the file name so let me final let me save this file as final data or final records it's up to you dot xls x fine so now this is second part this is part one uh maybe i can quickly uh just make it this is part one and basically this is part two part three will be sending email okay so before we run this uh let me just add one final point guys that you can see we are appending this record right which is coming from zip okay so if you iterate this zip data here then you will not get the same data over here so make sure you don't iterate this data here because this data will be lost so you will not get any data here so what we will do this is the final list we are getting so anyways we have verified right this data is working fine and apart from that whatever data we have simply printed maybe i can remove it right away i don't need this print statement i don't need this print statement as well and yes rest everything is fine so let's right click run once again and this time we should get the data one more enhancement guys which we will do like right now it is running in a normal mode we can run this data you know we can run this scraping headless mode as well like it will run in the background you will and you can do other activities as well right now what we are doing we are just waiting for the script completion right so let's go back and yes you can see part one is done part two is done just go here and you can see we got the same file which we have given here right final record dot xlsx says open this and here we go exactly same data which we have scrubbed right this is the name this is the price and now what you can do you can see the sheet name right uh the sheet name is is coming as she doesn't look good so let's change the sheet name also so before we do this appending let's change the name of the sheet so here what we can do we just write wb okay and if you just notice what was the sheet it is actually the sheet here right s capital so let's go here and just write that we have a dedicated sheet right so here what we want to do we want to simply change the title this is dot title and i will give this name as amazon samsung data that's all rest everything will remain same and if you simply want to append one more thing right now it's not giving you name and price which is a heading right so you can do one thing you can create a quick list that list you can add into like sh1 okay so let's say if i create or i will simply say sh1 dot append here in the iterable i will first provide the name which is this second i will provide the price so this will be my first record which is which i'm anyways appending and the subsequent records will be this okay let's run it once again so that we can see these two changes quickly run and let me delete uh you know close this because unnecessary i will be getting multiple windows multiple browsers actually okay perfect let's wait for part one part two yes done let's go back and just open this excel once again and perfect so this is the title that we updated samsung data this is the heading that we have set and this is name and price last part is sending email which is very very interesting for email guys we have a dedicated two videos i will show you so if you just go to this package called email demo we have discussed this one that how to send email with attachment so this is exactly what i'm going to do right now so in the test email 3 we discussed how to send email so what i will do let's copy the exact same code even i will copy this import so let's go back this is our program web scrubbing demo don't worry i will explain you the code but yes you need to refer the two videos that i have recorded where i explain each and everything in detailed manner so i will just copy these few lines of code and just after part two i will paste so what is happening right now first of all we are creating object of a class called email message then we are sending subject which is training invitation as of now i will change it to samsung scrapped data something something phone data depends on what kind of data you have scraped then from you can just say to automation team or if you are a freelancer you can just write your name or something i will write from automation team and i'm just sending to these two participants okay which is this id one is this id and this is one email template that we have so if i show you if you don't want to give this email template that is also fine but if you want this email template i will just first of all keep this txt file here where i will say hi team please find attached report from a test report which is scrapped from amazon site thanks whatever name you want to give and this is just a txt file so this txt file is available in the same package so it will open that particular template it will read the content it will set as a content in this email second part is which which attachment you want to add so in our case our excel name is final record right so i will say final record dot xlsx and it will simply read in binary mode it will read the data it will get the file name and it will add as attachment here we are giving main type as application subtype s dot xlsx and finally we are just sending email so here i just logged in with my account and finally we are sending this message that we have configured so let's do one thing once we are done we will simply say email sent and maybe i will just remove this part one part two let's right click run the final iteration and maybe i will open the email accounts okay which i have just given as to and from so just give me a second guys let me quickly open that email yes it's done guys if you just go and check it says final this is the from here it is coming right i'm printing the file data name which is not required i can remove it i can just remove this also final is email sent and if you see the email as a recipient i have given this particular email address so i'm getting an email where it says hi team please find a test report which is from amazon site things and this is the excel that we have extracted right just open it and here we go now last iteration one more thing one more enhancement i want to make it's totally up to you do you want to incorporate this in your code or not if you see most of the scrapping happens in the background in in the headless mode right right now what is happening it is opening everything in the foreground and it is running so in order to make it as a i will say background process or in a headless mode on the background we need to use the options class okay so if you just type options just ctrl space twice and you can see we have a dedicated options class okay in selenium which will basically help us to run our test in headless mode so i'm just searching options class this is for i this is for okay let's try with the chrome since you're working with chrome right so i need this options class from this particular package which is selenium.webdriver.chrome.options so basically we will be creating object first of all so i will just make it opt fine now in order to make it headless you just need to say first of all opt which is options class object now add one argument called hyphen hyphen headless okay so it will make your test in headless mode so we are just adding one argument now this argument you need to pass right sorry this object that we created we need to pass within this chrome constructor so after this you just need to add chrome options okay and just pass this opt so now everything will be running in the background so just to give you a new thing new idea let's change it to new i'm just changing the sheet name so that i don't have to delete and same i will pass over here so that at least we'll get to know this is the new record which we are getting when we are running our test in headless mode okay let's run once again and hope it should run in the background in the headless mode and yes this is just a quick warning which we're getting that we are running this version latest driver it is picking up and some deprecated warning that's fine it will scrap it will add into excel and find as an email as you can see guys it did not start anything in the foreground everything was happening in the background now if i go back and check this is the new mail which i got right and the file name is final record new dot xlsx perfect so that's a quick example i hope you have enjoyed so guys if you're able to complete this then trust me you will be using so many python concepts here okay that we have discussed in our python series now this is the final implementation how you can use all the concepts so i hope you will enjoy this web scripting part please try from your end and let me know if you need any kind of helper support i will try my best okay and in case if you want this kind of videos maybe with a different site with different examples just let me know in the comment section i will try my best to cover all of them so that's all and guys in case if you're new to this channel then please subscribe share with your friends and uh yeah we'll see in the next video have a nice day bye
Info
Channel: Mukesh otwani
Views: 11,719
Rating: 4.9653678 out of 5
Keywords: web scraping with python, Web Scraping In Python Using Selenium, web scraping, web scraping excel, web scraping example, web scraping excel email
Id: DUdvSxoaPlk
Channel Id: undefined
Length: 33min 11sec (1991 seconds)
Published: Mon Dec 14 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.