How to scrape dynamic websites using Selenium C#

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hello and welcome to this video in this video I'm going to teach you how to make a very simple web scraper which can scrape content that gets loaded dynamically so take for example this webpage this purple area is actually dynamic content and that basically just means that if I click on one of these menu items the purple area will be changed but the menu will not be changed yeah so I'm just gonna try and scrape these collections so it's just get started I'll press f12 and then click on this icon right here or press control shift C and then you can select the collection name and just gonna move this down there we go and then I'm gonna try and scrape this I'm just going to check it first there we go document does get elements and then by actually worth yeah it was my class now so yeah we got 19 elements yeah I can see some of these elements are not of collections but but that doesn't really matter I'm just gonna ignore that for now so I'm going to copy this card underscore underscore contents and then I'm going to fire once a collections and then find out private apply environment and then by then it was class name and then for each collection in the collections I'm gonna point it out so yeah I don't know why does that I guess I need to get built oh yeah I bet they're you now first of all we'll go to the main page with our web scraper and then we'll click on the discover tab so I'm also going to select the discover tab and then it has an ID so that's perfect I'm John reduce down we will say discover button and by ID their view and then click now we actually have a web scraper right here but since it's dynamic content we will not really get any elements or collection names so I'm just gonna run it and show you so you can see we click on the button and then we go onto this page but nothing gets printed out so yeah what we are going to do is make a new method obviously if you were making a real web scraper you would probably want to have a another class and then instead of a static class I mean static method like I did you would have a method on the web scraper class so yeah and I'm going to return a collection of our web elements and I'm just going to call it find elements and then it's going to have a by say parameter and we can just return just for now so if I call this it will do the same as this so yeah now what I wanted to do is basically it will loop make a loop and then for each iteration it will try and find the elements by by using this pi and if the collection contains elements then it will return them otherwise it will try again and try again and try again so yeah I'm going to return it for now I'm just going to say trial true and then we can say review and then say return to elements but only if the elements contains elements so yeah if it's empty it's not gonna return it's just gonna loop again and I'm gonna make a thread that's sleep and then we're just gonna say ten milliseconds and yeah now it should be able to load content I mean scrape the content and we can test it and it doesn't oh yeah I forgot to use it okay iPad yeah I'm just gonna run it again and yeah so it works and it also prints out other stuff but that doesn't really matter for this video now the problem is we have a while true and while that might be okay in some cases in other cases it it it isn't so take for example you have a proxy it runs but all of a sudden it doesn't work anymore so it's just gonna look in this while loop forever or maybe the document doesn't even contain these elements so yeah you you really want to put a limit on this to avoid problems and yeah so I'm just gonna use two a stopwatch I'm just gonna import it and then create a new one start new so this one will basically create a new stopwatch and then start it it just makes it easier and yeah so if it elapsed milliseconds is less than 10 seconds maybe then it should return otherwise it could return an empty collection or a null maybe it doesn't really matter well it does matter but in this case it doesn't really matter but since I'm not really doing any error handling so yeah I'm gonna f5 and just it it should still work it doesn't really make a difference in this area but let's say that we have a very slow connection so I'm gonna say Chrome network conditions and I'm gonna simulate simulate a very slow connection and then say conditions and then I'm gonna say that to download okay throughput is maybe 25,000 and the upload reach 10 and latency um I don't know from AP 1 SEC millisecond and then we are gonna add these conditions to the Chrome browser and since it's a Chrome network conditions it is specific to the chrome driver but we have saved the chrome driver as a web driver so I'm just gonna make driver as chrome driver and then dot Network conditions so now it should actually simulate a slow connection and yeah an X I'm just going to change this to returning a in empty collection so new read-only collection you know list and iWeb element okay okay so if I run this now it should actually I'm just going to print out a red line which tells done so if I run this now and we have this very slow connection as you might can you might be able to see it slows for a long time actually I'm just gonna change it a bit it's just a little bit too slow I'm not gonna change this to five seconds it's basically the same yeah it's very slow and obviously you wanna fill with this and try out different settings to see you which settings suits your web scraper okay it's loaded now won't partly load it and it's gonna click ok it was actually very fast yeah as you can see the images are not loaded yet well I think you get the hang of it or the point of of this so you can use this to test a very slow connection and see how well your web scraping to handle a slow connection and you can use this to a low dynamic content and yeah this is actually a very important part of building a web scraper being able to handle a slow connection since you often want to have many proxies running and maybe use some of the proxies are slow well you should prefer to have very fast proxies but yeah that is about it I think you
Info
Channel: Scrapax
Views: 5,131
Rating: 5 out of 5
Keywords: Selenium, Web Scraping, Automation, Dynamic website, page, Bot, C#, .net
Id: gRoMR3NcpPQ
Channel Id: undefined
Length: 12min 12sec (732 seconds)
Published: Sun Sep 01 2019
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.