Extract Data from Nested HTML Tags - Explained | Web Scraping Tutorials [English] 🔥

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hello everyone and a very warm welcome to WSCube Tech so guys in our previous session we   have covered that how with the help of  our beautiful soup and find all function   we have extracted data and also we have used  panels to convert it into a CSV file so guys   in our today's session we will be learning  that how we can extract data from a nested   HTML tags now guys what is nested html  text let me explain that to you foreign nested HTML tag will go back to our website and if  I go to my navigation bar you will see over here   that the data for example if I want to get the  data suppose I want if in our previous session   we have learned how to extract all the data but  what if if I want to get the data from a single   box only so how we can extract the data from a  nested data that only we will be understanding   for understanding that what I'll be doing I'll  be first okay let's just go over here as of now   and if I close any of this div for example  if I close this one and if I close this one   you will now notice that if I scroll over here  if I scroll on these uh my boxes over here are   changing which shows me that that each of these  are associated each of these tags are associated   with each block now all I need to do is copy this  class from here and let's get back to our program   so in our program I'll be writing import requests  from bs4 import beautiful soup let's pass the URL   here for URL we will be taking the URL a little  later okay for soup we will be writing soup is   equals to beautiful so before that we need to take  the request as well so R is equals to requests dot   get open closed apparent Theses where inside it  we can pass our URL after this we will be writing   soup is equals to beautiful soup and not inside of  Double quotation but beautiful soup and r dot text   comma the file will be handled in lxml now what  do we need to get over here is all the div tags   that are there on our page so all the tags of all  the boxes here for that I'll be writing so boxes   is equals to soup dot find all and here we will  be passing that it's inside the div tag where the   class underscore is equal to the class that we  have copied from there also will go back to the   browser and copy this URL from here let's paste  it back so I'll just paste it and let's print   boxes so as soon as I run it it will give me  everything that is associated inside the boxes   and if you want to see that if it is actually  working or not working if I go back to my website   you will notice that there are how many there are  I guess 21 blocks and the length of these boxes   should also be 21 for that we'll go back and we'll  find the length of these boxes so Len of boxes   and let's run it as soon as I run it it tells  me that the length of the boxes here is 21 that   means we are accurately searching everything  now I do not need to get the whole data what I   need to get over here is the data of a box so to  get a data from a box all I need to do is simply   write so we won't be using find all this time but  we will be writing um I'll be just putting this   here that box is equals to soup dot find instead  of writing find all we will be simply writing fine   what do we need to find again the div so div comma  class underscore is equals to inside the double   quotes we will pass the same class that has been  provided here and we'll paste it here as soon as   I paste it I guess I have mistakenly removed the  double quotation so let's put back those double   quotations and here I will be passing the index  that uh which index we are basically looking for   so we are looking for index number 2 because 0  1 2 or we can go for any other index as well for   example I want to search it for this Galaxy  Tab okay so Galaxy Tab would be zero one two   three this is how our indexing goes so on number  three we have this so let's pass number three   and I wanted to type box over here but I don't  know what I have typed by mistake but walks over   here and let's print it so print box let's run it  and there is a key error 3 over here so it works   with the find all the find devil actually give us  the data the find over here will find us the first   div dog over here but because we want to find all  of them and from them we need to extract what is   on number three that's why we will use find  all instead so here we will be using find all   okay and let's put this also inside the comment  box because now it doesn't make sense now inside   from this box what do we need to find we  need to find that uh we will be looking   for let's say price or if you want to know  the name for name is equals to box DOT fine   and we will be passing the for the name I guess  we had 8 dot text let's print name here and as   soon as I run it it says Galaxy Tab 3. that means  this is how we have extracted the data and if I   go back here let's go for the navigation bar and  if I choose this so see it's in a tag and from   the a tag we have extracted Galaxy Tab similarly  we have P tag in P tag we have What description   so we can extract the description as well the  ratings are also there in P duck that's why we   need to define the class so let's copy the class  and go back here and similarly for description   is equals to box DOT fine and I guess it was P  dot comma class underscore is equals to inside   the double quotation we'll be passing description  lastly we'll write dot text and print description   so now as soon as I run it I'll get the  description of this tablet that it has seven   inches it is seven inches 8 GB Wi-Fi Android 4.2  and white in color so this is how you can get the   description of any certain box that you want so  this is how you can extract the nested data from   any HTML and if you are talking about anything  else for example on this page also we have many   different things uh for example we have um this  thing this navigation bar here which takes us to   home computers laptops and tablets suppose if you  want to access this also so how you can access it   is very simple we have almost the same things here  so I'll just copy the link from here and paste it   back because I guess I messed up with the link  here so let me just put it back okay and let's   come back here so if I want to get anything from  this thing then all I need to do is write inspect   again let's just check for it so it says it's  in air and if I want to access the whole of it   it is in UL over here stands for an ordered list  so that means this there is a list of elements   which are unordered here so all I need to do is  uh get the class for class we'll copy the class   and it's in UL tag so let's get back to our  program and here we will write soup dot find all   all with the UL tag comma class underscore is  equals it also has an ID over here so we need   to provide an ID because I guess if there are more  of these UL tags then it won't be able to find it   so we need to provide with the specific ID as well  so ID says side menu copy let's copy this and for   ID or you do not have to write the ideas like how  you write the class you simply have to write ID is   equals to inside the quotation we have written  side menu now from the side menu if you want   to get anything for example uh if I write let's  put it inside something so navigation bar it is   so nav bar is equal to soup dot find all and from  this if you want to find all that means you will   be writing uh let's say text or yeah text is  equals to navigation bar dot find all and I   need to put on this score and from navigation bar  I guess this computer suppose I want to fetch this   computer the computer over here is in a tag and  if I need to open this I need to get inside of   it so here we have computer which is in 8 Arc  in list items class is active let's copy this   so we have list items over here basically and if  I close this you can see we are going to different   list items here so we have three list items that  is home computers and then we have phones so if   I want to access computers I need to go inside  the list items right so for that I'll be writing   in double quotes Li comma class underscore is  equal to inside the double quotation will pass   active actually because the name of the class was  active here also we can pass the index number so   I guess the index number was Zero also what we  need to do we need to pass the index number so   the index number I guess was 0 and from here  we will be writing dot text or we can directly   get the text over here like print text Dot text  which would like look a little different but okay   so let's paste this and as soon as I run it  this index is out of range the three is going   out of range okay because uh the previous  data we have changed the previous data   so let me just put everything over here inside  the comments and now let me run it again   so object has no attribute dot text do you want to  go find or okay let's try it with fine and as soon   as I run it it's giving me computers laptops and  tablets we need to only obtain the computer so for   that we will be writing the index as 0 so I guess  we need to provide the index over here as well and   after this we'll be writing find all and let's run  it again and again we are obtaining 0 why we are   not getting the index um so after this navigation  bar all we need to do is go back here and here you   can see in this navigation bar we have three  things we have home computers and we have   phones okay now what we need to do we need to get  computers and in computers then we have laptops   and tablets so we need to get to this thing that  computers laptops and tablets so for this I'll   come back to my code and because it's on I guess 0  index or one index let me just have a look on that   again so here it's on one index index one and the  element is zero correct I do not have to click on   the computers but yeah okay if I click on the  back icon so we have we need to get this thing   the computers thing so let's get back over here  and over here we will be writing the print soup   Dot find and what we will be finding we will  be finding the list items here and for the list   item also we have a class I guess here so here  the class is active so let's copy this let's   come back and paste it over here by writing class  underscore is equal to inside the double quotation   we will be writing this and we need to obtain  its text so we will be writing dot text inside   and I guess we cannot use it with print statements  let's give it any name so name is equals to this   and then you can do one thing here you can write  print name dot text and as soon as I run it let's   see what output will get here this index is out  of range I guess we need to go on number zero yes   so here we have computers and computers we have  laptops and then we have tablets over here this   is how you can find any data from your page  if you have any kind of nested data all you   need to do is you need to go to its parent  class and from its parent class you need to   grab that data so I hope guys that you have no  doubts in the questions in how we have extracted   data from nested HTML tags in our next session  we are going to talk about how we can extract   data from tables so stay connected guys and  I'll see you in the next session thank you
Info
Channel: WsCube Tech! ENGLISH
Views: 19,753
Rating: undefined out of 5
Keywords: WsCube Tech English, nested html element, nested Html tags, nested Html tags tutorial, extracting data from nested Html tags, scraping tutorials, nested ordered list html, extracting data, extracting data tutorial, neting list in html, html attributes, web scraping, web scraping for beginners, nested list at html, html, html tags, html tags tutorial, web scraping with python, html for beginners, html scraping data, nested html tags data scraping
Id: teHDlOzfN-A
Channel Id: undefined
Length: 13min 3sec (783 seconds)
Published: Wed Jan 11 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.