Data Scraping from Websites to Excel | Web Scraper Chrome Extension

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hey guys I hope you have a great week today I want to share with you a quick Hack That I Found uh recently like this week because I need to go to the end of the New York website and check all the list of speakers it would take me like a week to do it manually so I tried to figure out how I can find a better way to all to do this so um I tried the charger PT it didn't work but then I just found the plugin that I should use and it immediately helped me to solve the problem so uh let me show you what I'm talking about this is an entity New York website and I need speakers for 2023 as you see in the website they have a different like categories for each speaker type uh and there's a long list of them for each category so to go over these lists it would take me like at least two days I don't know just a waste of time uh for this mechanical work and I found a great plugin that I can use in Chrome it's called the web scrapper now I can go to web scrubber just install it I already installed it so I don't need to do it again I will show you how it works in a minute but let's think like let's talk on how does this uh plugin works Let's Pretend This is your website you want to get the data from right on the website we have the main directory root directory which is like the uh Global Link the home page link on the website then we have our nested links nested pages in it so I just simplified in the page here but we have a nested links after the nesting links we need to create a table wrapper table wrapper or has the type of element that we're gonna work with because it's gonna show what is going to your particular table once you've created the tail belt wrapper we have a bunch of elements in my case I just need like names and a companies but it can be different type of texts within the pages you need to get it could be name for example or like last name then let's see phone and an address right so basically what what's gonna happen is you will have the Excel sheet with all this data in one table in one row if we would not have this table wrapper element your data would look messy and you will still need to work with the data within your Excel sheet uh so let's go back to the nfg New York website where we have speakers I should be and a main director as I said for me it's a speaker's page uh because I don't need all the data from nft New York I just need speakers so it's a speaker's page and it's going to be our parent then we're gonna have a nested links for each of the uh page of the featured art Brands community and others and for each category we should take the name of the people and the company they're working at so let's go to inspect in inspect it you just need to go to your plugin that you just installed the web Scrapper and what you should do you need to create a new site map the new sitemap will start with the higher level link which is the speakers in my case and let's call it nft New York 2023 uh I'm gonna create a sitemap now we have a root directory which is our uh speakers page we need to create a nested um nested selector which couldn't be our categories right we have all of the categories we need to select so the the web Scrapper will go on each page and will read the data from each page so we're gonna call it in my case categories uh the type of the categories are links because all of these uh pages are links to particular uh page right you're gonna see speakers slash featured speaker slash art etc etc so we want you to go to Every link we need to select elements for system to go through first the featured then you can just click on the same element and system will out and it will automatically understand that it should go through all of this Pages links and read the data from there so then we should uh say done uh you can preview data now you see only features so why it's a problem right we want to have all of them separately so what you should click you should go multiple and now you will see all your categories in the different uh in the different roles the different tables so this data are going to be collected separately then we are saving this selector we are going inside the category and for each page we will need to select what we will need the wrapper we will need the wrapper you see I just almost done the mistake uh I wanted to go through the text of names and the companies but still too early we need to create the table we need a wrapper and I'm going to the let's call it data wrapper and it should be element as I mentioned before element we need all of the element that I'm going to select so we're going to leave it multiple now we will select the element the element basically is the component that contains all your all the nested like children the data that we need uh within that component so I don't need image I don't need this vote uh button but I need the name and the company name so I'm selecting multiple now it's now we need to select everyone everything on a page go in dock selecting you can see the element preview and I can see that 71 elements were selected so we know everything from the page were selected for the data preview it will not give you anything because we haven't added anything to our table wrapper component now let's save selector okay we are going into that wrapper and we add our data the actual content that we want to have in a table first I said I need names it's going to be a text type I need to select the name awesome done I don't need to click multiple because we are selecting for one component only so we're done we need to create another one the company selector text select the company name perfect no multiple just for one component so you will see data in the one row very organized and clean boom we're done right we can preview data if we want to all the names here all the companies here perfect then what we need to do now we need to scrap let's go and scrub the data start scrapping and now you see what's going on it just checking screening each page and finding that particular properties I asked the system to find and you will see what we're gonna end up with pretty cool it saved me so much time just let's wait awesome once you refresh it you're gonna see all the data here we go and we're done you see you have the category you have a name you have a company a moon from legal well gaming it's so long list I'm curious how many people are there uh let's figure out yeah the wrong one sorry let's go I need to export and there are two options you can export to csvm XLS let's see what's gonna what we're gonna have boom here we go company names we don't need the categories we do uh no not this category okay I'm gonna delete everything that I'm not using and here we go we have a clean list how many items here let me check oh wow it's about three thousand that's crazy awesome now you have at least five minutes I would say 10 minutes for you to figure out how it works and you're good to go that's it that's the hack I wanted to share with you today and please let me know in the comment below if you're gonna use it and how and which project uh would love to know that and hit the Subscribe button and I see you in the next video cheers
Info
Channel: Khrystyna Sopova
Views: 9,876
Rating: undefined out of 5
Keywords:
Id: TS2QRkrIO74
Channel Id: undefined
Length: 9min 38sec (578 seconds)
Published: Sat Mar 18 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.