How To Web Scrape & Download Images With Python

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
[Music] hello everybody and welcome to another youtube video so in today's video i'm going to show you how to web scrape and download images from the internet now what we'll actually be doing is going on google images we'll be looking for a certain number of images grabbing the source of all those images and then using the source of that image to download the image to our computer there's a variety of reasons you may want to do this the first that i could think of is you probably need testing or training data for some type of machine learning model and maybe you want hundreds of thousands of images or you want a ton of different pieces of data obviously it would not be efficient to go manually download those so you could use a script like this now of course there's a ton of other reasons and what i'm going to show you here will work for other websites it's not only going to work for google images but you will need to tweak the script slightly if you're going to apply it on a different site anyways i just want to mention before we dive into the code here that there is some legal concerns with doing this so if you're just doing you know 10 or 20 images or something you probably don't need to be concerned but if you're running really large bots that are scraping like tens of thousands of images you may get ipband you may have all kinds of things negatively occur to you and in fact if you want to potentially be able to avoid some of those issues with web scraping you should check out the sponsor of this video before we get started i need to thank smart proxy for sponsoring this video smart proxy is a smart but down to earth tech head platform that helps clients solve data access and entrepreneurship problems oftentimes you want to access or scrape data that is only accessible from a certain geographic location maybe you want to scrape and compare real estate prices data mine ecommerce products or acquire testing and training data for machine learning models well if that's the case smart proxy can help you do this smart proxy lets you bypass country restrictions and website blocks by providing the highest quality residential proxies in over 195 locations with over 40 million residential proxies you can connect to your target source as many times as you need from any type of device smart proxy also has advanced rotating proxies so you never need to worry about ip bands in addition smart proxy provides a no code smart scraper tool to perform data collection that you can use for no additional cost with a smart proxy subscription using the smart scraper is extremely simple just install the chrome extension open up a website start the scraper and select the items that you'd like to export check out smart proxy today from the link in the description to get access to the best proxies for web scraping and use the code tech with tim20 for 20 off residential and data center proxies thanks again to smart proxy for sponsoring this video alright so let's go ahead and get started the first thing we need to do here is download and set up a few dependencies for this project these are just python libraries so we're going to need pillow we're going to need requests and we're going to need selenium now we'll discuss what all of those do in a second but first let's just install them so go to your command prompt or terminal if you're on mac or linux and then you're going to type the following pip install and we'll start with pillow now you actually want a capital p here so pip install capital p pillow now for some reason that command doesn't work for you you can try doing pip 3 install pillow so just tack a 3 on right here and if that doesn't work for you i will leave two links in the description on how to fix the pip command for your system one for mac and one for windows okay so now that we have pillow we're going to install selenium so we want selenium like that notice that i already have all of these installed this is going to going to install a bunch of stuff but selenium is actually going to allow us to automate our web browser so that we can go and click on all of the images that we want to download get their source and then well download the source of the image okay so now that we have selenium the last thing that we want is requests like this this is going to be used to actually grab the data of the image that we want to download uh notice i'm getting some error here saying selenium 4.0 has requirement url lib you can just ignore that this will be fine in fact it actually fixed that error for me right here but everything should be good now okay so now that we have that we just need to install one thing that we need to use with selenium this is known as a web driver so just go to the link in the description i have it right here it is called chrome driver now i'm going to be doing this tutorial using google chrome i'm not going to show you how to do with a different web browser so just make sure you're using chrome otherwise this won't work anyways go to this link and then you want to click on one of these two versions chrome driver 95 or 94. now this is dependent on sorry the version of your google chrome so i believe i have google chrome version 94. if you had version 95 then you would download 95. anyways you can just try both of them if one of them doesn't work you can download the other one it's really easy to do this so i'm going to go with 94. again i'll leave this link in the description okay and then what you want to do is click on your operating system so i'm going to be win32 obviously you have mac and you have linux so win32 is going to download a zip folder for me so let me download that i'm going to open the zip folder and inside of here you will see that there is an executable so what i'm going to do is i'm going to cut this executable and i'm going to paste this in a location that i know so paste it pretty much anywhere on your computer so that i can use it and reference it from my python script so what i'm actually going to do is go to my desktop i have a folder here called web scripting images this is where the tutorial is going to be i'm just going to paste the executable right inside of here i know the path of this executable it would be this right here so i could just copy that path and now i'm able to actually reference this executable so just put it in a place where you are able to reference it you know the path to it because we're going to need that path in one second okay so now that the executable is installed the steps will be a little bit different for mac and linux but i trust you guys can figure that out just extract the zip folder move the executable to a path where you know the location of i can close this and this and i can go back to my python script and i'm just going to start by putting in the path of this web driver so i'm going to say path is equal to and then this now if you see something like this like this yellow highlight and kind of some errors here you just need to escape these slashes so you just put two slashes and then everything should be fine all right so let me just take a quick pause here and explain why we just did all of that so when we use selenium this allows us to automate our browser but we need a browser to automate so we need to download the web driver for whatever browser we want to use you could do this with firefox you could do this with safari but i want to use google chrome so i downloaded the chrome web driver which now allows us to use that executable file to automate chrome okay that's pretty much all we're doing here so you just need this you need to supply this to your actual what do you call it script for it to know where the executable is and then use that to automate the appropriate web browser so i have this here it's in web scraping images but i need to put the name of the driver which is chromedriver.exe so i'm going to put that like that now i'm going to go up here and i'm going to import selenium okay now i'm just going to show you how this driver works so actually i'm going to say from selenium i'm going to say import web driver like that and then i'm going to say wd standing for web driver is equal to selenium dot or actually i could just say web driver and let me check here i believe this is with no actually it's not with a capital that's fine and i'm going to say chrome because we're using google chrome and then i'm going to put the path right here to this executable file great so now that we have that what i can do is just run my code and let's see what happens here notice that we're getting this deprecation error you don't need to worry about that right now and you can see that actually opens up a google chrome window because i just started my web drive so hopefully that makes a bit of sense let me run this one more time and you can see here in google chrome it's saying chrome is being controlled by automated test software that would be selenium okay so we've got the first bit of setup done now that we've done that i want to show you simply how we can download an image so if we know the source of an image downloading it is quite simple so let me just go here to google chrome alright so you can see i'm in google chrome here notice that i was on images some google images and i searched dogs now let's just do cats and let's click on some cat okay let's right click and click inspect and then let's find the source of this image this is the source right here okay src is equal to this so this is actually the url it's using to display this image so src so what i'm going to do is paste this into a new tab and now notice it gives me the actual image itself that's great that's exactly what we want so if i just copy this here i can put this inside of my script i'll say image underscore url is equal to this and now i will show you how we can download this image all right so let's go ahead and write a function here that can download image so i'm going to say define download underscore image and then inside of here we are going to take the download path we are going to take the url and we're going to take the file name of the image that we want to download now before i go any further i do want to give credit to someone that i've kind of taken a lot of code from so there is this great blog post i need to paste it inside of here so you guys can see it on towards data science that says image scraping with python there's a ton of stuff in here i kind of just extracted some of the code that we actually need for this tutorial and i modified it but if you scroll down here it has a whole section on scraping images from google so a lot of the code i'm going to write comes right from here i've just kind of modified it and made it simpler but i just want to give credit to him i will leave this in the description in case you guys want to see the original source of a good amount of this code again i've just modified it and made a bit simpler because there was a ton and a ton of stuff in there that was kind of unnecessary for this tutorial regardless let's continue here so we have download image we have the url of the image we want to download the path to download it to and then the file name that we want to save the image as so what i'm going to do here is just say that my image underscore content is equal to and this is going to be requests dot get now that means i need to import requests so let's go up here and say import request now this allows us to make an http get request to the url so we're going to say url like that and then we're going to say dot content and this will actually give me the content of the image right because if we go to the url where the cat is this is well the image content so now that we have the content of the image i want to save the content of this image in a byte i o file type or in memory as bytes io data type so i'm going to say that my image underscore file is equal to i o so i need to import i o like this and then dot bytes i o and then inside of here i'm just going to put image content now what this is going to do is kind of convert this content right here into a bytes io data type now this is pretty much kind of like storing a file in memory i won't really explain it much more than that but we're storing pretty much the image file right here directly inside of memory now that we have that though i need to actually convert this directly to an image so that i can save it so right now we're just storing some binary data that's essentially what bytes i o is so i'm going to convert this now properly to an image using the pillow library and then save it so i'm going to say from capital pil import image like this and then i'm going to go here and say that my image is equal to and this is going to be image dot open and then inside of dot open we're just going to put the image file like this which will be our bytes i o data type so this allows us to actually load this in as an image now what i can do is actually save this image but first i need to generate the file path so i'm going to say the file path is equal to and this is going to be the download path plus and then this will be the file name okay so that is our file path now i'm going to say with open and this is going to be i guess we want the file path this is going to be in wb mode which stands for write bytes meaning we can actually write an image i'm going to say as f and then i will say image dot save i'm going to save it to f and i'm going to save it in a jpeg format okay let me look at my code i have some code on the right hand side of my screen or i guess to you guys that would be left but it's my right and just make sure this looks good everything looks good to me and we can actually go ahead and run this function now and see if this works so let's just print here success okay so let me just run through what i did here because i understand this is a bit complicated so first we actually get the content of the image this is pretty straightforward we're sending a get request to the url of the image that we want we then are saying the image file is equal to io.bytesio image content so we're taking the actual content from this request and we're going to now store this as a binary data type in our computer's memory this is very similar to storing an actual file in memory we then are going to convert this binary data to a pill image so a pillow image this just allows us to very easily save the image using image.save okay that's the first step we then generate the file path so we take the download path plus the file name combine those together that will give us the path to save the file and then we'll say okay we're going to open a brand new file that is at the file path so with whatever the name is there in wb mode which stands for write bytes so we can write bytes to this file we're going to load that as f and then we're going to say image.save we're going to save this image here to this file as a jpeg image and then print success so let's try this let's call download image let's go with download path just being nothing the reason why i'm going to put nothing is because i just want to download it in the current directory that i'm in so if you want to do that just don't put anything for the path and it will download it in the current directory for the url i'm going to pass the image url and then for the file name i'm just going to say test.jpg like that now make sure you add this as the extension otherwise it will not work okay so now that we have that let's go ahead and run the code so i'm going to run it like that let's give it a second to open up chrome i actually didn't even need to do that because we're not even using the web driver notice it says success and then if i go here to sublime you can see i have this test image and it's been downloaded to my computer great there we go okay so now we can download an image i'm just going to throw in a try and accept here though because sometimes this could fail so i'm just going to say try and i'm going to say accept and we'll say exception as e and i'm just going to print failed and then here we will print out e the point of this is just so that if this fails we can continue with the rest of our program we don't actually get an exception we just catch it and print it out okay so now that we've done that let's see how we can actually scrape a certain number of images from google images and then use the download image function to actually save the image now i'm just going to delete this images folder because this is what i was testing before so let's delete that and let's make a new folder so i'm going to go new folder like this we're going to call it images and now we can continue okay so we can get rid of this image url we can actually get rid of the call to download image because we're not going to use that right now now we can say define and we could say get images from google or whatever you want to call it doesn't really matter now inside of here we need to take in our web driver so i'm going to take in w not b wd for our chrome webdriver we're also going to need a delay and let me think if we need anything else we're going to need a max images as well so this is how many images we are going to download maximum or how many images we are going to get maximum okay so i'm going to hop over to google images here and i'm just going to run through the basic process of what we need to do here to actually click on these images and then grab the image source and download them so right now we see a bunch of images we also see some images up here okay like hidden whatever we see all of these images up here as well now when i click on the image i actually get the legitimate image source whereas when i'm just looking at them here this is just a thumbnail and it's kind of been resized it's not the actual image that i want and so what i would like to do is i would like to make it so that i click on these images and then i get the source of this image the large one that's showing up because that will be the real image i don't want to just download the thumbnails otherwise i'll get kind of the lower resolution images or just ones that are not the correct size hopefully you get what i mean so that's kind of part one that's what i want i want to be able to click on these images but then what's going to happen is as i continue to scroll down on this page we're going to load more images so notice how we get that kind of loading bar there so i need to actually have something that allows me to scroll down as well because if i want to download say a thousand images i'm going to have to scroll to the bottom of the page and then find all of the images now eventually you'll get to a point where you can actually click on load more results or it'll say you've reached the end so you do need to consider there is some edge cases here you might want to click a button that loads more results i'm not going to show you how to do that but that is something you may want to do and anyways that's kind of the first step so the thumbnails want to click on the thumbnails grab the image source and then download the image from the source however i need a way of actually finding all of these thumbnails and clicking on them so i need to find something that is similar between all of these images so i can look for something that represents all of these thumbnails images now some of you may be saying okay why don't you just search for all of the image tags on the page and if you're not familiar with selenium what you can do is you can search for class names you can search for specific tags you can search for text you can search for all kinds of stuff within an actual html document and so some of you may be saying why don't you just look for all of the img text right because if i click on this and click inspect notice this is img well the reason i can't do that is because that's going to give me these images at the top of the screen as well so instead what i actually want to look for is a class name that all of these thumbnails are going to have in common so if i go here notice that we have a class name here and i'll zoom in so you can see it but it's q4l uwd so that's the class name of this first image and if i inspect this second image right here so let's inspect it and go to it notice down here it has the same class name so what i'm going to do is search for this class name and any image that i find that has this class name i will actually try to download you'll notice if i go to this one here and i click inspect for some reason i have to keep clicking it two times this does not have that class name and so i won't get get those ones i will only get the actual thumbnails okay hopefully that makes sense but let's start off by doing that so i'm going to write a function here inside of this function i'm just going to call this scroll down and this is going to take in a web driver and does it need anything else i actually think that's all it needs okay then what i'm going to do is say dot wd.exe script now xq script means i can actually execute a javascript script and so i'm just going to execute command that scrolls me to the bottom of the screen now i believe this is window dot scroll and this will be scroll 2 and then we're going to put 0 and this will be document dot and then body dot and then what is it here i believe it's scroll height so we're just going to say scroll and height like that okay why'd it give me that i didn't want that okay so scroll height add our semicolon and there we go we now have the scroll down function i will also add something here i'm just going to say time dot sleep delay just so it gives me one second or whatever i make this delay to actually load the remainder of the images if i am scrolling down to the bottom of the screen okay so now that i have that inside of here i'm just going to paste a url now this url is going to be the url of the google images page that i actually want to scrape so what i'm going to do is just go to my browser tab here and i'm just going to copy this huge url which is for the search of cats if you want to search something else just search it copy the corresponding url and then just paste it inside of here there we go we now have our url and now i'm going to use my web driver to actually get the source the html source of this page so i'm going to say i guess wd dot get and then url this will actually load this page with my chrome web driver and then we can actually start looking through all of the images so first let's just actually run this function and see if this works so i'm just going to go down here and i'm going to say get images from google we need to pass to this a web driver a delay in max images so i'm going to pass my webdriver i'm going to pass a delay of two so two seconds and then i'm going to pass what was the last one i totally forgot max images is going to be 10. okay so let's run this and let's just see if it actually loads up this correct page notice it does it brings me to the cats page awesome okay so now that that is working i'm just going to add something here that closes my webdriver window because you'll notice that if you run this a bunch of times and you forget to close the chrome window you'll have like 100 of them open so i'm just going to say wd.quit and this will just close the actual chrome window once this function is done great okay so let's continue here i'm going to write a variable this variable is going to be image underscore urls this is going to be equal to a set just to make sure we don't have duplicate urls here we only have the same image one time and then what i'm going to do is make a while loop i'm going to say while the len of image urls is less than and then this will be the max images like this then we will continue so the point being once we have as many of as max images as we've defined here then we will stop all right so now that we've done this what i'm going to do inside of here is i'm going to start by scrolling to the bottom of the screen once i scroll to the bottom of the screen then i'm going to find all of the image thumbnails on the screen i'm going to loop through all of them try to click on them and once i click on them get the source of the image and then continue so i'm going to say scroll to end we're going to pass our web driver i am then going to say that my thumbnails is equal to and this is going to be wd dot find elements and inside of here i need to pass something that i need to import so i'm going to say from selenium.webdriver.com.buy import buy you'll see why we need this in one second but just write this import line then here i'm going to say buy dot and this is going to be in all capitals class name and i'm going to put that class so let's go back here and remember what that class was that class was q4 luwd again i'll zoom in so you guys can see that and let's paste that right here so this is the class name that i'm looking for this is going to give me any tags that contain this class name if any of this selenium stuff is confusing again i have an entire tutorial series on it so i'll leave a link to that in the description anyways what this is going to do is find all of the elements on the page that have a class name of this that's why i needed to import by so i could specify we're looking by the class name then what i can do is i can loop through all of the thumbnails and i can try to click on them so i'm going to say for image in thumbnails like this and actually what i'm going to do is go here and say the len of and this is going to be image urls to the max images now the reason why i'm doing this is because this while loop will continue to run until we've loaded and well got enough image urls so the point is when i'm looping through the thumbnails i don't want to be looping through thumbnails that i've already looped through and so if i say have 10 thumbnails already when i run this command again it's going to give me those same 10 thumbnails plus any more additional thumbnails that were loaded after i scrolled to the bottom of the screen so what i will do is start looping only after the thumbnails that i've already added to the uh the thumbnails list or to the image urls list hopefully that makes a bit of sense i just don't want to be adding the same thumbnails multiple times so we start at whatever the len of this is to make sure we avoid doing that then of course we only go up to the number of max images so we're not adding more images than what we specified okay so that's why we have that slice there now inside of here i'm just going to do a try except and inside of try i'm going to say image dot click we're then going to wait by our delay so we're going to say time dot sleep and we're going to sleep by whatever the delay is now i need to import time because i realize i don't have that so let's go import time you don't need to install this this is a built-in module same with io as i'm sure you probably have realized by now so we'll try to click on the image we'll then sleep now we're just going to have an accept and say accept and then continue now the reason why i'm having this is because we could potentially get an error when we try to click on this image so if that happens we don't want to interrupt the whole script so we'll just continue which means go to the next item in the for loop so now that we've clicked on the image though we want to find that larger image and then get the source of that image and add it to our image urls so let's look at this now so at this point what we've done is if this will load here why is it so laggy come on fix yourself okay that's a bit better anyways what's going to happen is we would have scrolled to the bottom of the screen right and then we would have clicked on some image and this image now is load on the screen so the screen looks like this so if i go here and i click inspect we can see that this is inside of a div now it's inside of the image tag sorry chrome's just lagging really hard here but if i look at this image i want to find a class name or something i can use to identify this specific image and it turns out that if i go here and i look at the class names we have a class which is equal to n3vncb now this is the only image on this page that has this class so what i'm going to do is i'm going to use that class to access this larger image and then add the source of that image okay so let's do that so i'm just going to paste this in here just so we have this all right so now we will do is say that images is equal to and then this is going to be wd dot find underscore elements we're gonna do the same thing we did before we're gonna do by class underscore name in all capitals and then we're just going to pass as a string this guy right here which is the class name that we're looking for now ideally this should just give us one image but it could potentially return multiple and so what we're going to do is loop through all of the things that this returns just in case something else does have the same class name as this we're going to do some checks on all these images and we're going to see if they have a proper image source if they do have a proper image source then we will get that image source and add it to our image urls so i'm just going to say 4 img or 4 image in images like that then what i will do is i will check the attributes of this image and to see if it has a source tag so i'm just going to say if image dot get underscore attribute and this will be src again the source of the image that we're looking for so if it does have that attribute and we will say http and this is going to just be http is in image dot get attribute src then rather than continuing what we want to do is actually add this source to r where is it image urls so again we're just checking if it actually has a source if it doesn't have a source then what's going to happen is this will return none and we'll just stop the if statement if it does have a source we'll make sure http is in the image source the reason we're doing that is to make sure that it's giving us a valid link that we can actually use to download the image so now what i'm going to do is say that my image underscore urls.add and i'm just going to add the image.getattribute and then this is going to be src after this i will simply print that i found an image so i'll print found image exclamation point like that great so now what we've done is we've added that to our where are we image urls so now all we need to do at the very end of the program here is just say return image urls now of course let me walk through this let me just zoom out a little bit so we can see this a bit better okay so we start up here we have our web driver our delay and the maximum number of images that we would like we then write our scroll down function which allows us to scroll down to the bottom of the screen we have our url this is the target url of the google images page that we want for now this has to be google images page if you wanted to change this to a different website you just have to look for different class names to find the actual images that you want we then are getting the web driver to go to this page we're going to make our image url set this is going to store all of the urls of images that we found we're then going to say while the lan of image urls is less than max images while we haven't found as many images as we want scroll to the bottom of the screen find all of the thumbnails of the images by this class name we then are going to loop through all of the images that we haven't currently looked at we're going to try to click on them we're going to give it a second to actually pop up the real image then what we will do is we will look for the real image in that kind of popped up window so something that has this class name we will then say for image and images this could potentially return multiple images so we want to look through all of these and find the valid image anyways we're going to say if image.getattributesrc so if it does have a source attribute on it and http is in this attribute that means this is a valid source for the image so we're going to add that to our image urls print found image and then finally once we go through the while loop we'll return image urls so now all we need to do is actually combine these two functions together and we should be able to download all of the images so we could theoretically inside of this function here just put download image and in fact if we do this we go download underscore image like this then we can just pass the image url that we found and we can download it but instead i'm going to do this outside of the function i'm first going to call this one get all of the urls and then i will loop through all the urls and call the download image function so we're going to say urls is equal to this and let's start by actually just printing this out and making sure this is giving us all of the valid urls that we want so i'm going to say print urls when i do this and i run the code we can sit back and we should be able to watch chrome go through here and do its thing and of course we get an error what is the error scroll to end is not defined okay i guess that's because i called this scroll down so let's just fix this name here to scroll down okay error one fixed let's try this now and let's see okay so it's scrolling down it then clicks on an image clicks on another image it should continue to click on images and it's going to wait two seconds in between this and i believe how many max images i put i think i put 10 as the max so we'll do this 10 times of course you can change the delay and you can change how many images you want and this will happen faster or slower depending on that okay we should almost be through 10 here looks like we are almost good ah and let's see here okay maybe not okay we clicked on another image it seems like we're clicking on the same images here so i'm gonna let this run for a second and see what's going on but we should be finished soon okay so i was letting that run for a second and it seemed like it was never ending so i'm just going to add a log here and just print found and we'll do an f string just to see how many images we've currently found to determine kind of the progress level we're at here so i'm going to go with found and then let's go with the len of and this will be image urls just to check why this is actually taking so long and then what i'm going to do is change the delay here to one and let's just make it five images to see if this is going to work so let's run the script now and see what we get okay so we scroll down click an image i want to see my logs here so found one found two found three found four and then it prints out all of the urls and okay it looks like we were good on that so let me change this to 10 now and let's see if we can actually find 10 urls so let's run this now and see what we get let's move this over okay so found one found two three four five and for some reason it is stalling okay there we go found six found seven i guess it just takes longer than i was expecting still only found seven okay found seven again how is it possible oh i guess we're finding the same image multiple times okay that would make sense found seven ah okay so i think i see what's going on here we keep clicking on the same image so let me see if i can fix this and i'll be right back okay so we determined what the problem was is that we kept clicking on the same image and since we kept clicking on the same image what would happen is we would just be looping through the same image a bunch of times and then we would return back to the same image and while this while loop is just going to go infinitely because we're on the same image so it's actually really annoying to fix this problem but what i'm going to do is implement something inside of here that pretty much says if we found the same image then what we're going to do is increment whatever the line of image urls is and max images that we move on to the next image so you'll see what i mean here but i'm just going to put right here i'm going to say if the image.getattribute is inside of image urls so if we already have this source in the image urls then all we're going to do here is we're just going to break out of this for loop but first we're going to say image urls and we're just going to add to this uh actually i don't know if i can even add to this uh instead we'll do this we'll do max underscore images plus equals one and we're gonna say skips plus equals one we're going to write a variable skips is equal to zero and we're going to put the len of images image urls plus skips and then in the while loop condition same thing the idea being here that if we're skipping by one or we're skipping by two images then we'll account for that as we're looking for the next thumbnails and as we're doing the while loop condition so we do actually end up getting the number of max images that we want hopefully that makes a little bit of sense but this is kind of the quick fix hopefully for this so let's run this now and let's see if this works because again what was happening is we're just clicking on the same image a bunch of times and well that was not helpful okay found one found two found three found four found five all right let's see if we can find another one found six found seven all right give us a found eight here we keep alternating between the same image what the heck okay and now we've we've clicked into some completely different images found eight found nine and and there we go uh okay i don't know exactly what happened there but it looks like we ended up clicking on um some weird image that brought up an entirely new page i'm gonna count that as a success for right now let's run this now i'm just going to turn this down to 6 and what i'm going to do is use all of the urls now and download all of them so i'm just going to say 4 and we're going to say 4i comma and then url in enumerate we're going to enumerate over all the urls then we will download them we'll say download image we're going to pass what do we want download path url file name so this is going to be slash this is going to be images slash we then want the url and for the file name i'm just going to go with the string of i plus dot jpg and this needs to be in a string okay let me quickly explain this so we are just looping through all of the different urls that we have we'll get the index of the url as well as the actual url itself we're then going to say download image we'll download to the images folder that's why i'm doing images slash make sure you add the trailing slash here then i'm saying the url is well this is the image you want to download and then the name of this image is going to be string i plus jpeg just so i have a unique name for every single one of these images so let's run this now and let's see if we can successfully download six images to our computer you're going to note that it's going to find all of the urls first then once it finds all the urls it's going to download them one by one okay we should oh i guess so i guess it clicked on one of the top links which is what we were trying to avoid but regardless we're still getting some cats here so success success success now if i go to images you can see that we have all of these images downloaded to the computer starting at zero going to five okay so i'm going to call this a success for now uh again you probably have to tweak the script a little bit because what seems to be happening is it's clicking on images that we don't want it to click on but i think i've showed you enough to be able to kind of mess with this and adjust it to whatever website you're working with i'm not going to continue going through and trying to fix all the really tiny bugs i just think this is probably good enough and i will leave a link to this in the description if you want to download this code yourself and just go ahead and modify it and again a big shout out to that blog post i showed previously the link will be in the description but a lot of the code and the reason i was actually able to do this is because i followed along with a lot of the stuff that was inside of there all right so with that said i am going to end the video here i hope you found this helpful if you did make sure to leave a like subscribe to the channel i will see you in another one [Music] you
Info
Channel: Tech With Tim
Views: 22,427
Rating: undefined out of 5
Keywords: tech with tim, how to web scrape, web scraping, download images, how to download images, gather images, source images, web scraping tutorial, pictures, images, downloading pictures, tim web scraping, selenium, selenium setup, web scraping setup, python, python web scraping, web scrape program, web driver setup, chrome, smart proxy, image URLs, web scraping tool, program to download images, chrome webdriver, fix pip for mac, fix pip for windows, web scraping code, tim scraping
Id: NBuED2PivbY
Channel Id: undefined
Length: 37min 28sec (2248 seconds)
Published: Fri Oct 29 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.