C# Web Scraper To Parse Ebay Html #2

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
welcome to part 2 of the eBay web scraper in the last video we made a list of products from eBay and this list is holding HTML data for these products so in this video I'm going to go over how to extract all that data from our list of HTML let's create a for each loop so that we can grab all this data from each of our items I think the first thing I want to pull out is the listing ID because if I was putting this into a database somewhere I would want some kind of unique identifier and these listing IDs shouldn't change so if I ever wanted to look at the product again in the future I should know if this ID exists already or not if I'm storing these in my database I can just see if this idea exists and then either update it or remove it or do whatever I want so so this will be my list item and this one's easy because it's it's right there I could just use get attribute value so let's take a look at this code and see see exactly why I'm doing this so so you can see my listing ID is here the gate attribute value is gonna get one of these attributes here so I could get the ID I could get the listing ID I could get the class gate attribute value of listing ID and if it's null give me an empty string okay that looks good let me go ahead and run that okay now I have a list of numbers here so I should have 50 numbers that so this should be all of our attribute or all of our listing IDs now let's put a space in here so this looks a bit nicer okay that's better the next thing I want to pull out is the product name itself so let's see where that is in our list so the product name is stored right here and you can see that this is an h3 the class name is LV title and this is a descendant in our list so with that in mind let's pull this out product list item descendant is a ch3 we're the node attribute value is class and if it's null give me an empty string and the class value equals whatever it was over here so it's LV title and just for safety I'll throw in a first or default and we want the inner text of this I believe this returns a string yep okay so let's run that and see what we have okay good looks okay I need to clean it up a bit though you can see that we have this one is just completely messed up with tabs or spacing there's an extra new line in here and in here so I will have to clean that up a bit I'll just trim these extra characters so we'll say dot trim and let's trim the return new line and tab okay now when I run this let's see what we get just get rid of this okay this looks much better now I have the ID and the name of the item looks pretty good so what else do we want to pull from this how about the price so where's the price stored in all of this HTML so the price looks like it's right here and this is a list item class that is named LV price space PRC I'll copy that and this will be very similar to this one that we did up here so I'll copy that out this is a list item it's a class and we want this name comment this real quick this is the ID this is the name this is the price all right and let's run that okay that looks pretty good I don't see any problems with that oh here's a problem so we have the price and we have this trending at so I don't want this so I'll have to clean that up anything else okay we have another one listing price and previous price so I'll have to get rid of that as well I don't want that in my data there's another one to clean this up I'm going to use regex so this will use a regular expression to clean this up so I'll add that in here and we'll have to add this okay so we're going to match this expression that I'm going to give it so this will be this string and I only want the number so since the number it looks like some number dot some other number that's all I want so this will take the first the first sequence of that so I'll say backslash D for decimal value and then any number of decimal values after that and then a period and then another decimal value and any number of decimal values after that actually the plus is at least one decimal value so I'll know I'll have at least one here and then at least one here and a period in between so that should give me what I want okay so they got rid of my dollar sign which is okay and let's see if we have any other end consistencies how is this sold for $0 interesting well if I had the URL I could go search this out and see what happened there you know what let's add the URL next so in the URL let's see where this thing is stored so I know what I'm grabbing so my URL is up here and I can see that it's under this a tag let me see you that is so an a tag can be a hyperlink or an anchor so it is a hyperlink which is what we expected but since it just since it's a descendant here let's go ahead and grab that out and the href is what we're looking for so with that in mind you can come down to our product list again product list dot descendents and we're looking for the a and I'll grab first your default and the attribute attribute value is href and if it finds nothing give me nothing back dot let's go ahead and select this out and see what looks like drop that brick point okay so looks like we're grabbing our URLs now looking good I'm just gonna look through this and see if I can find any problems with it and I'm just gonna grab one more thing I wanna know what types of listing these things are so I'll grab the buy it now so it'll show if it's a a Buy It Now listing where's that over here so I can see it's there so it's buy it now and this is a list item under class LV format listing type and that's what I'm gonna search let me just copy out one of these it's going to be very similar so it's an li class LV format I'll have to get rid of this expression since I copied it alright let's see what we get okay good so I have I have some some bids in here so zero bids thirty three bids here's a buy it now so that looks good this looks like a problem then we have an extra space in here so that would be a space after my URL so let me trim that out I think that'll do it so I do only have fifty items here if I put a count in here I can show that okay so okay we have 50 and I know that eBay will allow me to return 200 up to 200 so I will make that change in my url since this is what's controlling that so if I just make this 200 instead of 50 and now I should get 200 results so okay we have 200 so that looks good and I think that's gonna be pretty much it for this video it's just kind of a example of how you can scrape some information from websites and now if I wanted to I could store the stuff into a database and do some more interesting stuff with it but I think that's gonna be it for this video thanks for checking it out I will get into using the eBay API for searching eBay and returning product information and all kinds of good stuff so keep an eye out for that but that's going to be for this video and I'll see you the next one so it did some scrolling here and I found an interesting listing this one has zero bids and it has something else I would have to parse out of here if I wanted to if I want to clean this reserve not met up but zero bids so let's just take a look at this listing and see what's going on if I can just copy that out with ctrl C come over to ebay search that and what do we have so this is to reserve not mat listing it sold for zero dollars that's why I didn't sell but anyway as you can see the links work here and that's pretty much it
Info
Channel: Blake B
Views: 16,995
Rating: 5 out of 5
Keywords: c#, web scraper, parse html, c# parse html from ebay, example
Id: BE708X6r24o
Channel Id: undefined
Length: 17min 13sec (1033 seconds)
Published: Thu Sep 28 2017
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.