C# Web Scraper To Parse Html From Ebay

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hello and welcome to the eBay scraper example so this is what I'm going to show you how to make and what we're gonna do is download some HTML from eBay and parse out some data and get some information out of it so if I run this this is what we will end up with and so you can see I have a bunch of information here from a bunch of products and this information is from the sold listings page of eBay so I just took I just searched some Xbox one and we have a listing ID we have the name of the listing we have the price of the listing this would be a Buy It Now or a number of bids this one's a buy it now and we have a link to the listing itself so these are sold listings so I am working in Visual Studio 2017 and this is a console app so let me go ahead and make a new console app and here's my console app so let's call this eBay scraper console the first thing I want to do here is get some sort of URL that I want to parse information from and what I'm doing is I'm gonna go to ebay go to advanced search I want to look at the completed listings and I'm going to search Xbox one and search that so this is the URL I want to search so if I just copy this out I have my URL and let's just define this somewhere okay we have our URL and I'm going to use the HTTP client to retrieve this information so what I can do is make another variable and I'll have to add this to my to my project so add the using net HTTP and what I want to do is load everything from this URL into a variable so it's pretty another variable this will be my HTML and we used to get string async method and we'll give it our URL okay now I'm using my string here to get the HTML so let's take a look at the HTML that we just retrieved so let's write this out this will be HTML dot result and now let's put a red line in here to keep this from ending my program before I have a chance to read this result and there we go so now we have a bunch of HTML here and so where is this coming from this is coming from our web page and if I come back to my web page and right-click and inspect ok we can see that we have exactly what is on this web page now let's get into parsing some of this stuff at this point I want to make another function for this because I don't want to have all this just clutter it up in my main method so let me create another method for this let's call this get HTML okay and now I'll just copy all this down there and to my new method perfect so it still runs I still have everything I want now let's get into parsing this data and for parsing I'm going to use the HTML document so what I want to do is create an HTML document and to be able to use this HTML document I will have to add this this new get package if I come over to my references and say manage NuGet packages i want to browse and this was the HTML agility pack so this one right here let's install that you okay perfect and I'll come back to my HTML document and we're going to use the HTML agility pack now we need to load this HTML into our HTML document so that we can parse it out so this will be my document and we can call dot load HTML and this has an error so we cannot do this because we need this to be async actually so let's go ahead and make this method async and we'll have to add an oh wait here okay there we go so we can get rid of this so now we have loaded our HTML into our HTML document for parsing and to understand what I'm going to parse out of this let's take a look at our HTML now if I if I mouse over this I can see everything that's affected this everything that this code is touching and how its defined so if I want to just see this list this item for example I can just right click this inspect that and it's going to take me right to that chunk of code so what why I'm really interested in is this entire list of products here so if I come up here you can see that this is a list item Li and we have a bunch of Lists here so this is all my products so this is what I'm really interested in because I want to get this list so what I'm going to target is this guy right here this is the listview inner and everything is also under my result set items so keeping that in mind let's go ahead and parse out this HTML and grab this list of items what I'll want to do is create another variable this will be product list and I'll explain what this is doing in just a moment but The Descendants here is going to look for everything in my HTML that meets these parameters that I'm about to type in so I want the UL and let's look back over here so you can see what this is doing so in my list view inner we have this this you will so this is an unordered list and this is my list view enter ID so this is what I'm targeting so I want to grab all the descendants in my HTML document where the attribute value the attribute value is ID and if it doesn't find anything it'll return an empty string so we want the ID where all right I have an error here what's the problem oh I have an extra okay so this is what we're targeting so the ID equals listview enter now what happens when I print this out I'll just throw a breakpoint in there and so we have one thing in our list and let's take a look at this inner text okay so you can see that we have the name of the item here and some other information the price shipping so we have some information so we'll want to break this up into multiple rows here so we want a list of products right now we just have it all in one line so let's do that I'm actually want to call this product lists because it's going to be the list of products so this will just be products okay now this would be my list and this will equal and we saw this at position zero so we'll we'll select that and the descendants of this would be well let's take a look so let me comment this out again let's take a look at that code one more time and your HTML here we go this this is what I want so this is what's gonna tell us how to break up this list into multiple rows so we can have a product per list item so what I'm looking at is this Li so this is in list item this is actually this looks like the unique ID for the item and oh here's actually the listing ID right there and what else do we have in here so we can see okay here's a link to the product itself so this is my H ref we have an image in there we have a bunch of nice data here's the price so here's buy it now let's see about parsing out these things and this is the first one so this is going to be repeated for every item I'm gonna copy this out so it's a bit easier to look at put this in notepad plus plus and change the language to HTML the scroll to the top okay here we have our first list item so let's just search this Li space ID because I think this is unique for every product I know this is my first product so if I search that there's the first one okay there's the next one so it looks like these are separated and in this format so I can go ahead and use that to figure out how to split up these into multiple rows for my list so let's make another variable let's call this list items product list items and we will take this and we know we're at position 0 we'll go to the descendants because we want the list item descendant this will be the list item and we're our node attribute value is ID okay so I'm using ID because that's my list item name this is the attribute value and I know that the name is gonna have item in it so I'm going to search for that so we'll add an equals here actually I'll use a dot contains lowercase I this is not necessary there we go so to list there we go now let's take a look at this and see what we have so in our list items we have 50 items which is good very good sign let's open up one of them and look at the innerhtml here okay this looks good so we have we have all of our information here from one item if I go to the next item I'll have the same thing for that item and so on so this is exactly what I want now I can parse this stuff out of here and pull out this data for each item that is going to be it for this video in part 2 I'm going to add a for each loop that will go through this list and pull out all this information so be sure to check that out I could not edit it down enough to get it into one video so part 2 coming up next
Info
Channel: Blake B
Views: 79,670
Rating: 4.8837771 out of 5
Keywords: c#, web scraper, parse html, ebay, c# web scraper, example
Id: B4x4pnLYMWI
Channel Id: undefined
Length: 17min 59sec (1079 seconds)
Published: Wed Sep 27 2017
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.