How To Scrape Multiple Pages on Websites | Web Scraping using BeautifulSoup

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hello everyone and a very warm welcome to WSCube Tech so guys in a previous session we have   learned that what all kinds of websites are there  which have multiple pages in our today's session   we are going to talk about how to deal with such  kind of website so we will be learning that how   we can deal with such kinds of websites also we  will be learning how we can extract the data so   in our part one we will be working with the URLs  that how we can move to the next page URL using   beautiful soup and request Library so let's get  started so if you remember in our last session   we are viewing websites like Airbnb and Flipkart  if I open Flipkart right now over here if I go to   my first page let me just scroll down and go to my  first page so this is how my first page looks like   and the similar goes for the second page now here  we will not go to the second page instead we will   use the next button for extracting the URLs also  we will be using this next button only so first   of all let's get the the URL of this page and  copy it all you need to understand is that if I   right click on it and inspect it also I'll do one  thing I'll open this link in the new page so open   link in new tab and if I'll scroll down I guess  this is the link of our second page yes this is   the link of our second page now if I talk about  the next button here okay so let me just go to   our navigation bar and let's go to our next button  here as you can see the next button over here has   the link of page two and if you don't believe me  let me just click on this so as soon as I click   on this you can see over here again we have the  second page the new link has opened that has the   second page if I inspect this and go to the next  button it will give me the link of the third page   so let's inspect it let's go to the navigation bar  and let's click on the next button and related to   it we have this link here and if I click on this  link we will be directed to the next page that   is page number three so what we will be using we  will be using this next button to move to our next   page we will be using the URL of this button to  move to our next page and extract the data from   this first of all in this video let's learn that  how we can move to different pages using python   for this all we need to do is we'll go back to  our Python program over here we may write down   import requests and from bs4 import beautiful  soup once you are done with this let's take the   request from the page so R is equals to request  dot get and I'll also pass the URL here so URL   is equals to inside the double quotes let's go  back here and we'll be taking the URL of our   page one so let's copy this URL from here and  let's go back to our code now let me paste it   so this is the URL of our page one once we are  done with this we need to pass this URL inside and   here we will be writing print R let's wait for the  response let's see what is the status code so as   soon as I run it the response is 200 that means we  can obtain the HTML of the page and we can scrap   the data now once we are done with this we'll  write soap is equals to beautiful soup and inside   this will pass r dot text and we'll deal the file  in the X lxml format okay let's print the soap   here and if I printed the output which I'll get  here would be the link that means the code for the   whole page but we don't need the whole page right  now what do we need we need the link as of now so   let's do one thing let's remove this from here and  let's go to the link let's go for the link here so   for the link what we know is that the link exists  in the next page that means in the next button   that means if we click the next button it will  give us the link of that page so from this next   button we need to obtain the href of this button  that means the link from this button to obtain the   link from this button all I need to do is click  on this navigation bar let's go to the next we can   over here see that it's an a tag and the class  over here is something like this so let's copy   this and let's go back here now what I'll do I'll  write a link is equals to our next page link is   equals to for next page I'll be writing NP and  P is equals to soup we need to find this a tag   and we need to pass the class to class underscore  is equals to here we will be passing the class and   what do we need to obtain from here is the we  need to find the href so let's get the href and   let's print this next page whatever we are getting  from this let's print it over here okay so as soon   as I run it let's see what are we obtaining  over here see you can see that we have got   a link over here and let's copy this link let's  copy this link and actually it's a huge link so   okay let's do one thing it's going to select  everything so okay I'll just go towards the end   of the page from here I'll just start copying  and quickly come to the starting okay so let's   copy this and let's open our browser in our new  page let's paste this data and let's enter it so   if I enter this you can see I'm not getting the  second page of our Flipkart why am I not getting   the second page of this flip card because if I  go over here and check the link in the link you   will see it's written slash search while on the  top of every link you will see that in front of   search search we have the host name over here so  all we need to do is copy this host name from here   let's copy the host name come back to our program  and here with the next page so what we will be   doing we will be creating a new variable called as  complete next page in this complete next page we   will pass this string and we'll remove the slash  plus what we will be joining we will be joining   our next page string over here and now let's print  C A and B and let's see what output we'll get   so as soon as I again run it you can now see  that the text has been highlighted over here   why because now it's a perfect link and as soon  as I click on it I'll get the URL of the next page   that is our page number two and if we scroll down  also over here to see if it is page number two or   not you can see it's the page number two so this  is how you can obtain the link of the next page   using this next button okay now once we are done  with this what do we need to do is we need to now   make this this new URL as the URL on which we want  to take the request and then again we can write   this so beautiful soup and everything again so  what I'll do the new URL here would be so URL is   equals to CNP inside the request again I'll write  requests dot get because now this is the URL of   our new page and again we need to find the soup  that means we need to find the HTML of this page   and again we need to go to the next button because  if here we come over here and if I inspect it and   again if I want to go to the next button I need to  inspect it go to the next button and I'll do this   I need to do the same procedure so the same thing  we will be doing here that here we'll be going   to the next page for next page we will be passing  the URL here and again soup is equals to beautiful   soup r dot text comma we'll be dealing in lxml  let's pass this inside a loop so while true call   on now let's run and see that what outputs we are  getting every time we run it so I'll just tell you   uh before I run it that we'll be getting a small  not error exactly but small problem over here what   this problem would be let's have a look on that  so as soon as I run it I'm getting the links so   that means with the help of while true it is going  to the different different links it's going to the   different different page like from one it is going  to two then it's extracting the HTML of that page   I'm going to the next button then again we have  the URL of page number three we will extract the   soup of that page and we will be going to the  next button but here if I'll scroll towards the   right what you will see is that it is moving  between page 2 1 2 1 2 1 why it is happening   over here is because this is only possible in the  cases where we have different links for different   pages but in the case of Flipkart what will happen  over here is you what you will notice here is on   the top that if I go towards the end of this URL  the URLs of all the pages are same like if I show   this to you let me increase the size so you will  what you will notice is that the URLs of all the   pages are same only a string has been attached  over here like one two three four for page one   we have one in the end for page two we have two  in the end for ph3 we have three Indian for page   four we have four in the end so this is how it is  working and for scrapping the return for moving   for moving to different different URLs also what  we need to do we need to use a for Loop which will   keep our whole link as same but it will change the  number in the end over here so for that we will   be using a 4 for Loop how we will be using this  for Loop is very simple let's remove this for now   because we don't need it right now but yes this  will definitely work in the cases of pages which   have different URLs for different pages and that  we will be covering in our project as well okay so   here all we need to do is we will be using a for  Loop here for for Loop I'll be using a for loop   on the top let's remove this while true as well  let's put it back in its position and let's use   a for Loop that for I in range so now if you want  to get the data if you want to obtain the data of   let's say 10 pages 20 Pages all you need to do is  write from one call in suppose I need to obtain   the data not colon but actually comma suppose if  I want to obtain the data of 10 pages or 11 pages   I'll be writing 11 so for that because of that  I'll be getting the URL of 10 pages let's put   this everything inside the for Loop also what do  I need to change is we will be using this we will   be giving here this I as a string so in the end  let's remove this one let's put a plus sign and in   the string we'll be giving r i okay and now let's  run and see what output we are getting so as soon   as I now run it we will start getting the URLs and  responses as well but if I go towards the end now   you will notice that it's going from 1 2 3 4 5 6  7 8 and 9. now what is why this two is coming over   here two and one is coming over here again is that  we need to start this from 2 instead of one so now   if I run it again we will be getting one two three  four five six seven eight and nine y9 because it's   starting from 2 here and it is giving me 10 pages  so this is how I'm getting all the pages over here   so I hope guys that you have no doubts and no  questions on how we are moving to different   pages here all we need to do is use a for Loop  we need to understand that what is there inside   the link every link will be different and we need  to understand the patterns over here so in this   pattern as you can see we have one two three  four five six seven eight nine then maybe in   Airbnb if I search this one let's inspect this  one so I'll right click over here and inspect   it let me just uh scroll it back when you make  it a little smaller and if I scroll down and   go to our navigation bar click somewhere here so  here you can see if you do not have the similar   links we do not have any such strings also every  link is different to each other so because this   is different in that case what would happen in  that case we will be using this so in this case   we will not be using the for Loop and we will  only be using the while loop here so I hope you   guys have no doubts in the questions in what we  have learned at this session so as of now we have   learned that how we can move to different pages  but now in our next session we will be learning   that how we can scrap the data from first page  move to the next page and scrap the data from the   second page then move to the next page and we can  keep doing the same thing so I hope you guys have   no doubts in the question so stay connected guys  and I'll see you in the next session thank you
Info
Channel: WsCube Tech! ENGLISH
Views: 23,951
Rating: undefined out of 5
Keywords: scrape multiple pages python beautifulsoup, scrape multiple pages using python, scrape multiple pages with beautifulsoup, web scraping with python, python web scraping, web scraping multiple pages, web scraping multiple pages tutorial, web scraper python tutorial, web scraping, web scrapping, beautifulsoup, beautifulsoup4 python, web scraping amazon product, web scraping amazon product with python, beautiful soup, amazon web scraping, python beautifulsoup, wscube tech python
Id: 704hLk559c8
Channel Id: undefined
Length: 13min 9sec (789 seconds)
Published: Sat Jan 21 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.