How to Scrape a Table From a Website using BeautifulSoup - Complete Tutorial [English]

Video Statistics and Information

Video

Captions Word Cloud

Captions

a very warm welcome to WS Cube Tech so guys in a previous session we have covered that how we can extract the data from a nested HTML tags in our today's session we are going to start a complete new module where we will be learning that how we can extract a data from a table so guys there are many website where we have a table and from and if you want to extract that data from a table how you can extract it we will be learning that in our this complete module so let's have a look first of all I'll show you what are all these websites where we have a table data so if I come over here in my browser in the test size we do not have any tabular data but if at the same time if I check this site where I have list of coronavirus affected districts and cities in India this has stability data as you can see here we have rows and also we have columns and the data goes like this so here we have a tabular data another example of tabular data is our IPL auction uh table or let's say IPL auction stats so so this also has tabular data at the same time if you look for the football statistics there are also some data that is in a tabular format and here in the best stock research I have found a website which shows us a tabular data so these are few examples of tabular data and if you want to extract data from these tables how you can do it we will be learning that in our this session so let's start with this one and let's go over here in our code let's import requests import pandas so why we are using pandas because at the end we will be storing this data inside our CSV files so as PD and from B is 4 import beautiful soup once we are done with this the next thing that we need to do is write the URL so URL is equals to in our double quotation we will be passing the URL also we'll write R is equals to requests dot get and we'll pass our URL here let's print our printing R if you are trying a new website whenever you are trying to scrap a new website printing R is really important here so just to get an idea that if you are getting the HTML or not of any particular website so if I copy this and go back here let's paste this in our URL so as soon as I run it let's see what output we will be getting here so here the response is 403 that means this page will not allow us to get the HTML of the page so let's try something else maybe the next website would allow us so I'll write here URL is equals to in our double quotes we'll be writing the next URL so I'll go to the same website here and let's go for this one so ticker.phynology dot in and let's go back to our program let's paste it here as soon as I run it the response is 200 that means we can scrap the data we can get the HTML from here once we are done with this the next thing that we need to do here is we need to create our soup so we need to actually cook our soup so soup uh the create sounds a little bit weird when we are saying soup so let's cook our soup that is very beautiful and here we'll be passing our DOT text comma XML now what do we need to do if we are looking for a table so let's understand let's go to the website and let's understand what is going on here so if I right click here click on inspect to get the table tag all the tables start with the table tag if I go to the navigation bar all I need to do is uh just click on anything here for example I clicked on this company I got the link I got the code which is associated to company and little above that I have the table and as soon as you click on this table you will see that the whole table thing has been highlighted here so that means this is the data on which we will be working let's double click this and copy it we'll go back to our program and here for the table we will write that table is equals to soup dot find and we are looking for a table tag here comma plus underscore is equals to this and let's print table now as soon as I run it this is how the output looks like so we have a table which who has a something like th and inside the th we have company we have uh I guess price and the high over here so these are what these are the headers so in this video we will be learning that how we can obtain the headers from a table for table we need to understand one very simple thing that if I go back here now for the headers for that means these are everything over here is in a row okay these headers themselves are also there in a row so let's go here and you can see it is in some TR tag and in the TR it is th that means it is a table row and the table row these are the table headers so we will be searching for the th tag here for th tag all I need to do is go back here let's go let's go for yeah so we have a th tag with no class so let's just come back here and instead of printing this table we will write uh for headers is equals to table dot find all th tags and print headers so as soon as I run it it will give me something like this that we have received the headers but now what do we need to do we need to get the text from here to obtain the text from here we will be writing let's put this inside not exactly the print statement but latest part so we will be putting this inside the comment box and let's iterate it using a for Loop that for I in headers call on let's create an empty list called as uh title or we can call it titles is equals to empty list for I in headers now we are iterating in this and we need to obtain the string for that we will be writing name or title not tile exactly but title is equals to I dot text and we need to append this title in the list of our titles for that I'll be writing titles dot append and inside this I'll be passing title and lastly I'll print this table titles and let's have a look on what all things we are getting from here so as soon as I run it we got three things here that is company price rupees and they high rupees and if I go back and see this table as well it says company price rupees and day high rupees so that means we have got three columns over here and you can assign them as a headers as well so to assign them as a header in our data frame we need to create a data frame here so let's uh just put this part in the comment box to create a data frame all we need to do is write I'll just shift this a little bit DS is equals to PD dot data frame and in this I'll be passing that columns is equals to this titles so this titles thing is my column and actually it's a list not a thing but okay so we will print now data frame so as soon as I run it here we have a data frame which is completely empty right now because we have only obtained the headers as of now so I hope guys that this is clear that how we can extract data from a table that means if you want to extract the header from a table how you can extract it all you need to do is remember one thing first that first of all you need to get the request from the website and after that you need to go look for a table tag and after that you need to go and look for a table tag from the tables we will be looking for th stack that is the table header and then after this uh in the next session we will be learning that how we can extract the rest of the data from our table so I hope guys you have no doubts in the questions and I'll see you in the part 2 of this video thank you

Info

Channel: WsCube Tech! ENGLISH

Views: 37,349

Rating: undefined out of 5

Keywords: web scraping tables, how to scrape tables, scrap html tables with python, scraping a table from a website, scraping html table tutorial, scrape table data, scraping table data for beginners, web scraping html tables, scraping table data using beautifulsoup, beautifulsoup table scraping, beautifulsoup for python, web scraping using python, beautifulsoup python, beautiful soup, data scraping, beautifulsoup tutorial, web scraper, python beautifulsoup, web scrapping, wscube tech

Id: T1qv3ksMDq4

Channel Id: undefined

Length: 8min 41sec (521 seconds)

Published: Fri Jan 13 2023