How to Scrape a Table From a Website using BeautifulSoup - Complete Tutorial [English]

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
a very warm welcome to WS Cube Tech so guys  in a previous session we have covered that how   we can extract the data from a nested HTML tags  in our today's session we are going to start a   complete new module where we will be learning  that how we can extract a data from a table   so guys there are many website where we have a  table and from and if you want to extract that   data from a table how you can extract it we will  be learning that in our this complete module so   let's have a look first of all I'll show you what  are all these websites where we have a table data   so if I come over here in my browser in the test  size we do not have any tabular data but if at   the same time if I check this site where I have  list of coronavirus affected districts and cities   in India this has stability data as you can see  here we have rows and also we have columns and   the data goes like this so here we have a tabular  data another example of tabular data is our IPL   auction uh table or let's say IPL auction stats  so so this also has tabular data at the same time   if you look for the football statistics there  are also some data that is in a tabular format   and here in the best stock research I have found  a website which shows us a tabular data so these   are few examples of tabular data and if you want  to extract data from these tables how you can do   it we will be learning that in our this session so  let's start with this one and let's go over here   in our code let's import requests import pandas  so why we are using pandas because at the end we   will be storing this data inside our CSV files  so as PD and from B is 4 import beautiful soup   once we are done with this the next thing that we  need to do is write the URL so URL is equals to   in our double quotation we will be passing the URL  also we'll write R is equals to requests dot get   and we'll pass our URL here let's print our  printing R if you are trying a new website   whenever you are trying to scrap a new website  printing R is really important here so just to   get an idea that if you are getting the HTML or  not of any particular website so if I copy this   and go back here let's paste this in our URL so  as soon as I run it let's see what output we will   be getting here so here the response is 403 that  means this page will not allow us to get the HTML   of the page so let's try something else maybe the  next website would allow us so I'll write here URL   is equals to in our double quotes we'll be writing  the next URL so I'll go to the same website here   and let's go for this one so ticker.phynology dot  in and let's go back to our program let's paste   it here as soon as I run it the response is 200  that means we can scrap the data we can get the   HTML from here once we are done with this the next  thing that we need to do here is we need to create   our soup so we need to actually cook our soup so  soup uh the create sounds a little bit weird when   we are saying soup so let's cook our soup that is  very beautiful and here we'll be passing our DOT   text comma XML now what do we need to do if we are  looking for a table so let's understand let's go   to the website and let's understand what is going  on here so if I right click here click on inspect   to get the table tag all the tables start with  the table tag if I go to the navigation bar all   I need to do is uh just click on anything here for  example I clicked on this company I got the link   I got the code which is associated to company and  little above that I have the table and as soon as   you click on this table you will see that the  whole table thing has been highlighted here so   that means this is the data on which we will be  working let's double click this and copy it we'll   go back to our program and here for the table we  will write that table is equals to soup dot find   and we are looking for a table tag here comma plus  underscore is equals to this and let's print table   now as soon as I run it this is how the output  looks like so we have a table which who has   a something like th and inside the th we have  company we have uh I guess price and the high over   here so these are what these are the headers so  in this video we will be learning that how we can   obtain the headers from a table for table we need  to understand one very simple thing that if I go   back here now for the headers for that means these  are everything over here is in a row okay these   headers themselves are also there in a row so  let's go here and you can see it is in some TR tag   and in the TR it is th that means it is a table  row and the table row these are the table headers   so we will be searching for the th tag here for  th tag all I need to do is go back here let's   go let's go for yeah so we have a th tag with no  class so let's just come back here and instead of   printing this table we will write uh for headers  is equals to table dot find all th tags and print   headers so as soon as I run it it will give me  something like this that we have received the   headers but now what do we need to do we need to  get the text from here to obtain the text from   here we will be writing let's put this inside not  exactly the print statement but latest part so we   will be putting this inside the comment box and  let's iterate it using a for Loop that for I in   headers call on let's create an empty list called  as uh title or we can call it titles is equals to   empty list for I in headers now we are iterating  in this and we need to obtain the string for that   we will be writing name or title not tile exactly  but title is equals to I dot text and we need to   append this title in the list of our titles for  that I'll be writing titles dot append and inside   this I'll be passing title and lastly I'll print  this table titles and let's have a look on what   all things we are getting from here so as soon as  I run it we got three things here that is company   price rupees and they high rupees and if I go back  and see this table as well it says company price   rupees and day high rupees so that means we have  got three columns over here and you can assign   them as a headers as well so to assign them as a  header in our data frame we need to create a data   frame here so let's uh just put this part in the  comment box to create a data frame all we need to   do is write I'll just shift this a little bit DS  is equals to PD dot data frame and in this I'll be   passing that columns is equals to this titles so  this titles thing is my column and actually it's   a list not a thing but okay so we will print now  data frame so as soon as I run it here we have a   data frame which is completely empty right now  because we have only obtained the headers as of   now so I hope guys that this is clear that how  we can extract data from a table that means if   you want to extract the header from a table how  you can extract it all you need to do is remember   one thing first that first of all you need to get  the request from the website and after that you   need to go look for a table tag and after that  you need to go and look for a table tag from the   tables we will be looking for th stack that is  the table header and then after this uh in the   next session we will be learning that how we can  extract the rest of the data from our table so I   hope guys you have no doubts in the questions and  I'll see you in the part 2 of this video thank you
Info
Channel: WsCube Tech! ENGLISH
Views: 37,349
Rating: undefined out of 5
Keywords: web scraping tables, how to scrape tables, scrap html tables with python, scraping a table from a website, scraping html table tutorial, scrape table data, scraping table data for beginners, web scraping html tables, scraping table data using beautifulsoup, beautifulsoup table scraping, beautifulsoup for python, web scraping using python, beautifulsoup python, beautiful soup, data scraping, beautifulsoup tutorial, web scraper, python beautifulsoup, web scrapping, wscube tech
Id: T1qv3ksMDq4
Channel Id: undefined
Length: 8min 41sec (521 seconds)
Published: Fri Jan 13 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.