Excel VBA Introduction Part 49 - Downloading Files from Websites

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
welcome to this wise L vba tutorial in this video we're going to have a look how you can automatically download files from websites using Excel VBA we're going to start the video by looking at how you can declare a simple API function because sadly Excel VBA doesn't have a native method for downloading files from URLs so we're going to look at declaring a function and give you a bit of background information about how that works when we've declared it and it's working we'll show you how you can use that function to download a single file and then we'll take things a bit further and integrate some of the techniques we've looked at in previous videos to do with scraping webpages to find a download link and then using that to download a file we'll take that a bit further again and look at them how you can process a sequence of pages to download all the files on them and then we'll just add a few final finishing touches things like how you can allow the user to choose folders using file dialog boxes so they can choose where the files will be downloaded to how you can create new folders using file system objects to create folders if they don't already exist and then also how you can display the final results to the unusal using the Shelf function to open up a folder in Windows Explorer so quite a few fun things doing this video although it's quite long and detailed in parts let's get started okay so the starting point for this video is a brand new blank excel file which i've already just saved as a macro enabled workbook so from there I can head straight to the developer tab and the visual basic editor and as usual we can insert a new module I'm not going to bother renaming it at this point but I'm going to create a quick subletting in here called download single file because that's the first thing we're going to do now at that point we immediately hit a brick wall because sadly VBA doesn't actually have a native method for downloading files so our first real task in this video is to declare an API function that's going to help us with that task now I don't want to go into a huge amount of gory detail about how API functions work in general in VBA I am planning a separate video on that by the way which will hopefully appear in a reasonably near future but I think it is worthwhile having at least a little bit of background information about how API functions work in VBA so API functions belong to files called dll's and the particular dll file that we're going to be using in this video is one called URL Mon DLL you actually find this in in the system32 folder of your Windows installation so usually at C windows system32 and here you can see my url Mon file now this file contains a range of functions and associated elements which I can use in VBA and you'll see that there's a whole range of other DLL files in here as well the documentation for VBA and with with API functions isn't always that great Microsoft do have a few bits of documentation for the win32 API which is the one we'll focus on in the video on window API functions but the one for you are Arman isn't particularly great for VB I struggled quite a lot to find some official documentation it's best I could come up with from a web browser I've got this URL monikers and asynchronous pluggable protocol reference which is most part of the micro developer network site of course this is the official documentation for that particular URL Mon DLL so if you wanted to see all the functions that were available there's a little link there for functions which you could happily click on you can get a great big long list of all the functions available the particular one that we're going to use is one called URL download - file and if you wanted to see the documentation for that you click on that link and then as you have already got open up in a separate tab here now sadly the documentation that Microsoft provided there's no the BA or VB documentation it's all C++ which is completely useless to us for M for our VBA code so what we're going to do is is sort of show you how I've translated this into the VBA code for declaring that function so obviously the first thing we can tell from this page is the name of the function URLs download to file you can also see that it's got five parameters one two three four five and it's got a return type here of eight results which again isn't particularly useful for us in VBA items but we'll translate that into the appropriate data type it gives you a bit of a description about what the various parameters are and then it tells you a little bit about the return value as well and what sorts of errors could possibly occur with it so there are really actually only two parameters that we're interested in for the way our video works the first one is this one called szd URL which is the string that contains the URL to download and then the second one is the the file name which is another string that tells it where to download the file to now to declare this function in VBA what we're going to have to do first is back in the VBA editor we need to declare functions outside of Civility and so declarations for API functions can't belong to a subroutine so at the top of the module we're going to start with the word declared you can optionally add the word private or public to the start of that by the way as well by default they'll be public but if you wanted to restrict it to just this module you could say private declare then you can write the word function I'll try and spell that properly I've written it enough times I should be able to do that and then you can state the name in the function now weirdly the name of the function that you provide here doesn't have to match the name of the function as it's displayed on the page I've just shown you doesn't have to match the name of the function in the DLL it's usually good practice to do that and in this case I'm going to follow C so I'm gonna say URL download to file the next thing you have to do is state which library or which DLL the function belongs to and you do that by adding a live parameter to the Declaration and then in a set of double quotes state the name of the DLL that the function belongs to so in this case it's nice and easy we've just seen it's called URL Mon now we can get away with just passing in the name of the file here because it belongs to the system32 folder if you've got third-party dll's you want to reference then you have to stay the full file name folder path if they're not stored in system 32 and technically at that point that's actually all we have to do I know we've got some parameters to declare as well but imagining this didn't have any parameters all you have to do at that point is hit enter and that's the declaration for that the API function the next thing we have to do isn't immediately obvious from the documentation but we've declared a function called URL download to file and I made a point of making sure that it matched the name that was given to us on documentation page but sneekly that's not actually the name of the function as it exists in the dll if you scroll all the way down to the bottom of the documentation page then right under the very bottom you'll see the real names for the function or functions in this case the match URL download a file server actually two separate versions @w version than AE version w it stands for widen is to do with unicode strings and a is for ansi which deals with windows code page strings I'm not going to go to any more detail than that if you're interested there's plenty of information available within pages like this one if you wanted to have a quick read of that page and then the follow on material if you find yourself particularly bored one weekend maybe or you just want to immerse yourself into how strings work in Windows all you need to know is that the name of the function we're going to use in this video is URL download to file a so we've got two choices for how we could declare that we could either back in the VB editor we could say the actual real name of the function we want to declare so URL download to file a which means that when we're calling it in our other VBA subroutines that's the name of the function we'd call we've literally say URL bound no to file a now you prefer to have this a slightly more sensible name you might not want to call it URL download to file at all you might want to give you your own unique name you can still refer to the actual DLL function by adding an alias attribute to the declaration so if you say alias and then in a set of double quotes you put in the actual name of the function so here I'm just going to copy and paste URL download to file and then tag in letter a at the end okay so that's now a function declared when we call it in VBA will say URL download a file but it will actually reference a function in the DLL called URL download to file a okay so next we have to deal with these five parameters that this function has so let's just switch back to the documentation for the moment and I'm going to scroll right up to the top of the page and then I'm just going to copy/paste the five parameters so that I'm just going to copy this little section out of the code I know that I can't use these data types these are C++ data types but this will just help us to remember the names of the parameters and assign the correct data types in VBA so I'm just going to copy that to the clipboard then then back to the VBA editor and I'd like to break these across multiple lines so I'm going to use the space underscore continuation characters and then just paste those lines in just below now a long term completely ready look completely hideous at this point but we'll solve the problems a little bit by a little bit next we'll need to deal with the data types for the five different parameters because clearly the ones that are used at this point definitely aren't VBA data types so the first problem is that I call Pecola LP unknown is the name of the data type LP crops but lots in API functions balute sense the long pointer so I have a quick look back at the documentation you'll see stuff like that if you look at the parameter section you've got Pecola pointed to the controlling blah blah blah blah blah you genuinely don't care about this at this point trust me you don't care about this right now the only two parameters we're really interested in here at the URL and the file name the strings that represents the file will download and this file will save it as for everything else all the three of the other parameters all we care about is that they have the correct data types assigned to them that's important because we don't assign the correct data types and things tend to fall over so for pointers for you can essentially convert those into long integers so a 32-bit signed number so if I go back to the VB editor rather than calling this LP unknown as the data type we're simply going to say Pecola as long now I'm assuming in this case that you're working with a 32-bit edition of Office I'm going to talk about how you get this converted to a 64-bit edition shortly but for now we're assuming a 32-bit edition go with it cuz it'll be nice and easy to convert into the 64-bit edition in a moment once we finish with this one I'm going to put in the space underscore again to continue on the next line then the next thing we have to deal with here is MLP ect shrinks although it's a pointer LP this is also a string so it's some character I forget what the T stands for apologies STR string so we can take that away and we know that we need to pass in the URL which is of course it's going to be a string in VBA so SZ URL as string strung up spot it there we go as string and then a space underscore what's the next one again for the file name that will generate that's going to be a a string as well of course so we can assign a string datatype to that the D word stands for double word I believe which is an unsigned integer in C++ so again for this one it can just be assigned a long data type D doubly reserved I'm not even sure where they provide this to you I'm not too familiar with C++ I'm not sure why they provide you with this parameter as it must have a value of 0 assigned to it anyway and it's reserved so it has to have 0 assigned so if we just simply go back to the VB editor and again we can say as long and then the final one and LP bind state is called that that's only not a vba datatype but again LP long pointer so we can convert that into a long integer I've just forgotten the space underscore at the end of the previous line let's just do that so LPFM CB long pointer function call back as long then the very final thing to do just like any function in VBA you should declare the return type so in this case the return type is also as long so having done all that if you were to click away from any of those lines hopefully everything stopped being read and looks as though it's valid VBA code one last thing we have to do now for each of the parameters is declare them by Val so you may remember from previous videos we talked about the difference between by ref and by Val for the API function to work we've must declare these predators by Val so I'm going to type that in once in front of the first parameter and then just copy and paste it in front of all the others which is nice and easy because I've lined them all up so neatly then if I click away from that as all five parameters declare bivalve so essentially that's a version of that function declared that will work happily for a 32-bit edition of Microsoft Office now what if you needed your function declaration to support a 64-bit edition of office either exclusively or alongside a 32-bit edition currently the main thing stopping this one from working in a 64-bit edition are the datatypes of the pointer parameters these need to be converted into 64-bit integers what I'm going to do to demonstrate this is just copy and paste the declaration paste it in down below and then I'm going to modify the top version of this so if you were supporting office 64-bit exclusively and you didn't need it to work for a 32-bit edition what you could get away with is declaring your data types as and I love the name of this data type long long and but sadly I can't do that in this example because I'm running a 32-bit edition of office that data type doesn't technically exist fortunately Microsoft have provided the solution they've got a data type now called long PTR long pointer and this is quite clever because what we will do is in a 32-bit system it will work as a 32-bit number but in a 64-bit system it was a 64-bit number which is quite clever so if we just do that for each of our parameters now it's not always necessary to convert every single parameter into a long pointer some of them that could be stored as 32-bit numbers so ones that are pointers ones are just numbers some can can avoid this but in our case just for safety's sake I'm going to declare every single one of these as a long pointer one more thing we have to do as well is make sure the function is declared as pointer safe so I just say private declare PTR safe at the very top this indicates to the code that this is not a version that will work in both 32 and 64-bit editions now that we've got a version of the function that's going to work in both 32 and 64-bit distance of office what's the point in keeping this 32-bit only version around the reason you might want to consider doing that is if you're still supporting versions of office prior to 2010 because if that's the case this version of the declaration won't work at all so what we need to do is write some kind of conditional statement that checks if we're working in office 2010 or later declare this version of the function otherwise declare this version now of course you'll probably know about if statements but something you can't write a regular if statements outside of a procedure like a subroutine what you can do however is add a kind of preprocessor directive to the top of your module so what we're going to do is write a hash mark and then what looks very much like an if statement and we're going to test the result of a constant that checks if we're running in VBA seven so VBA seven was the the revisions of the code base that was released in office 2010 so if that's true then I want to declare this version of the function otherwise using an else clause and again I've got to proceed by else clause with a hash symbol I want to declare this version of the function of course then I've got to have my end if but again that's got to be a preprocessor directive so it's got to start with a hash symbol but end if and it looks to all intents and purposes like a regular if statement just with the addition of the hash symbols now the good news is that we're done with the tedious difficult work of declaring your functions and this really is always the worst part of working with API functions to be fair in the real world is more likely you'll be finding somewhere that you can just copy and paste these declarations from and indeed Microsoft to actually provide a document for VBA declarations for their common windows functions sadly this particular function is in part that document which is why we've had to create it from scratch so now that we've got you got it created we should be able to give it a quick test so in our downloads single file subroutine what we're going to declare a couple of simple variables let's say dim file URL as string and also dim destination file as string now to test this in a fairly simple way I'm going to download a file that I know exists as part of the wise our website so I've already browsed this in Internet Explorer but it really doesn't matter which web browser you're using just for this particular part in fact you don't even need to download the same file but I have picked this particular file for a specific reason we're going to reuse it later on in the video so at some point it is worthwhile coming to this particular page and then finding the link that says here that's where you can download any files associated with the video so what I'm going to do right at this point is right-click on the link that says here and I'm going to choose to copy the shortcut and head back to the VB editor and I'm going to say file URL equals and then inside a set of double quotes just paste in the URL that I just copied okay so now we need to specify where the file is going to be downloaded to and to do that I'm going to create a brand new folder somewhere on my machine so in fact I'm going to use my open Windows Explorer window from down here and we're just going to browse back to the root of my C Drive and then let's create a brand new folder in here which I'll just call for the sake of demonstration I'll call it VBA what I now need to do is back in the VBA code I need to say destination file equals and then in some double quotes C colon backslash VBA now it's not sufficient just to enter the folder path itself you need to enter a full file name and extension as well so I say V P a backslash and then let's say something like let's just call it VBA download dot zip for now okay so all that we have to do now is call our URLs download to file function and pass in the two paths that we've entered here so let's say download break on URL download a file and then we've got five parameters to fill in but we need to make three of them simply equal to 0 so the peak caller parameter needs to be equal to zero then the second parameter is the URL of the file which on a download that's file URL pull by a comma and we've got to specify where we're downloading it to that's nice and easy that's destination file and then two more zeros to fill in the values for the other two parameters so having simply kunal that all we do now is execute the code and when we've done that if I simply check back into my VBA folder we also find that I've now got a downloaded zip file okay so I'm just going to delete that file that I've downloaded at this point and then head back to the VB editor general to show you what will happen if I enter a URL that both doesn't exist or an invalid URL if I just for instance change HTTP to HTTPS and then just run the subroutine again I don't get any error messages or any indications that things haven't worked but of course I won't end up with a with an output file one simple way to check whether your file download has successfully started if I just change each TP to http you can check the result of this function so I say if URL download to file and then I'll need to open and close a set of parentheses around the argument list and then I can check if it is equal to zero then that means that the file download will successfully have started so I can say there something just for convenience debug print file download begun or file download started and then else let's just do it up print again and an ENDIF and instead I'll say debug print file download not started so I'm not the most inventive thing in the world but if I can just have a look in the VBA window at this point of the VBA folder nothing in there if I run this subroutine with a correct URL I'll find in a while then look in the immediate window file download has started and indeed it will have finished as well hopefully there it is I'll just delete that one again and then let's just quickly go back and edit the URL so it doesn't make sense anymore and then if I run that one again how many file download not started and of course I haven't got any output either checking to see whether the download is completed there is a horrific lis complicated ways it is so far beyond the scope of this video I'm not even going to think about starting to talk about it one of the potential way you could do that is using some kind of file system object checking if the file exists but then of course you'd have to leave in an appropriate amount of time to allow the download to complete with a file like this which is so small it's not too big a deal but for larger files that might be a little bit trickier anyway for the time being we're just going to fix a with this basic validation that checks at the file download has at least started and we just fix the URL back to the original and that'll do for this little section okay well now that we've established the basic file download system is working let's do something a bit more interesting with it what we'll do next is create a subroutine that scrapes a single webpage to try to find the video download link so what I'm going to do is use this same webpage again but what we're going to rather than point to the exact file URL we're going to try to write some code that will find it in the page and then use that to download the file so to make that work we're going to use a technique that we've used in a previous video in fact it was in this one excel vba part 48 scraping multiple web pages start involves setting a reference to a couple of new object libraries so back in the VB editor what we'll do first is head up to the Tools menu and choose references and then if we scroll down through the reference list you're looking first of all something called Microsoft HTML so you find Microsoft HTML object library and I've just gone past it there it is and then check the box next to that one and then scroll down a little bit further and find Microsoft XML so again this is a couple of very object libraries we declared or reference in the previous video on this topic so Microsoft XML so go with the latest version that you have available in my cases XML version 6.0 so if I check that box and then click OK we can now write a new subroutine but just get rid of the immediate window for the time being a new subroutine that's going to load a web page first of all let's create a new sub load webpage now we've used this technique in a couple of the previous videos so I won't go into all the all the detail but just to quickly go over the basics of creating an xml httprequest i'll start with a variable that can hold their new msxml to dot xml excuse me XML HTTP 6 0 so in my case I think I mentioned this in previous video is depending on which version of the library you've referenced that might be slightly different it might just be XML HTTP but go with whichever version you've referenced what I'm also going to do is declare a variable which holds the URL of the web page so I'm gonna say dim bit page URL so I'm going to start with that as a fixed point this time as a string I'm just make sure that's a capital L as well device I won't sleep tonight there we go and then what we're going to do is set vit page URL equal to and then I'm just going to go back to my web browser and copy and paste not the URL of the file itself this time but of the overall page so I'm just going to copy that from the address bar at the top and then head back to the VB editor and paste it in to those double quotes next I can open the new request so to do that I can say XML rec dots open and the method internal apply is get the URL enter no apply is vid sorry vid page URL and I don't want to let this happen asynchronously I what essentially want this request to wait until it's finished loading before I proceed with the rest of the code so in this I'm going to set the async parameter to false then I can say XML adopt send which will send the request sorry beg pardon XML req dot send and then I'm ready to check what status at that request was so I'm going to say if XML rec dot status is not equal to 200 indicating that it successfully returned a page then I'm just going to say end if and then have a really basic message box that says something along the lines of problem and VB new line and XML rec scuse me that was meant to mean ampersand rather than asterisk leaving your line amps and XML req dots status ampersand and then a dash to separate that from XML rec XML req dot status text so if anything goes wrong then at least we'll get some useful information about it then I also want to exit the subroutine at that point and then we're basically at the point now where we can pass the result of that into a new procedure that's going to interpret it and scrape it to find the URL of the download file so I'm going to create a new sibling team below this one let's just scroll down a bit and below the end server I'm going to create a new sub which is going to be called something like find file link if I can type that properly find file link now I look for this one to work it's got to have some text which you can use to generate a new HTML document I'm going to open up a set of parentheses and declare a single parameter called HTML text as string inside there then I'm going to declare a new variable called dim HTML doc as an MS in fact as a new MS HTML and dot HTML document so this is going to be used to hold the the HTML document object of course now to populate that new blank document what I'm going to do is say HTML doc dot body dot in a text by Q about an inner HTML equals HTML text to trigger that to actually populate that parameter what I'm going to do is call this find file link from the previous procedure so I can say find file link and then I'm going to pass into it the XML req dot response text so having done all of that we should now have a new HTML document to work with now before we go any further with writing the code what we should probably do is have a quick look at the HTML of the page to see what we're working with so to do that you can use any web browser I'm going to use Internet Explorer again as that's one that I have open and if you want to inspect the HTML of the page or a particular element of the page the simplest thing to do is right-click on it and then choose inspect element again it's a technique we've used in several of the previous videos of this series so that will open up the document object model Explorer and eventually it will take you to the place in the HTML document representing the thing that you've just right clicked on so we can see from that that this little link here this here keyword is a hyperlink it's got a tag of a indicating a hyperlink with a unique title attribute video files download and an href attribute which indicates the URL of the file that it's going to download of course now sadly I did love to be able to pick out that single specific element uniquely out of the entire document but it hasn't got any identifying piece of information that I can use to do that it hasn't got an ID or a name so I can't pick that single item out what I could do is pick out all of the a tags out of the entire document and then loop through them all to see which one has got this title video files download it's a fairly good guess that that would be the only hyperlink in this page with that title and even if it isn't then it was most very much likely to be the first one in this page so let's have a go at writing some code that will loop over all of the a tags in this document but to the VB editor and we'll start by declaring a couple of new variables so the first variable I'm going to declare is going to hold all of the a tags or all of the elements representing hyperlink tags so I'm going to say dim I'm just going to call it links as an MS HTML dot I HTML element collection so I can hold a collection of HTML elements of course then I'll have another variable called dim link which I'm going to say is an MS HTML dot I HTML element okay so that should be fairly familiar to you if you've watched some of the previous videos in the series what I can then do is say set blinks equal to HTML doc dot gets attribute sorry back if I don't get elements by tag name and then in a set of round brackets and double quotes the name of the element that I want to get or the name of the tag of the element I want to get is letter A so that will get a reference to all of the hyperlink tags in the entire document so now I can loop over all the links in the document by using a for each loop so I can say for each link in links and then say next link this is a very familiar example to if you've watched the previous couple of videos in the series all I'm going to do for convenience is say debug print link dot let's see in a text that will give us the the text that should the link should be attached to followed by a comma and I'm going to say link dots get attribute what I want to do is get the at the H ref attribute from the link so having done that I'm going to display the immediate window again and then I'm just going to clear its contents and then what I should be able to do I just increase the height of the immediate window of it what I should then be able to do is click back into the load web page procedure which will load the original web page and then pass the HTML of it into the find file link okay so let's just run the load web page routine and then let's see how many links we get printed out it's quite a lot I could have done a fine find the length of the links of the links element collection so let's just have another quick go that let's have a debug print links dot length so we can see how many we're actually dealing with links dot length will tell you how many items belong to the collection so I'll just clear the contents of the immediate window again and then head back to the original routine loading the page then run it all again and right at the very very top 132 links it said just before my mouse fail there we go 132 links that's quite a lot let's head back to the HTML of the page and see if we can work out a slightly more efficient way to do this because lipping over 132 links is a little bit excessive so let's head back to the HTML if we have a look at where this a tag sits it's inside a P tag a paragraph tag which itself is a child of a div tag now the great thing about this div tag here is it has a class attached to it is yl video show post or w/o video show post so one of the really nice methods of the HTML document is get elements by class name so if I can get all the elements and I suspect there's probably only one a div tag in e with a WOL video show post class but it will silly fewer elements than the entire document so let's have a quick go at getting a reference to that single individual element I'm going to go back to the VB editor and we'll have a new variable which will hold a reference to it so let's say something like dim let's call it video div as an MS HTML dot I HTML element what will then do is immediately after we've loaded the HTML document we're going to say set video div equals HTML doc dots it gets elements by class name and then we need to pass in the name of the class that we've just seen too it's wo video show post now of course that will return a collection of elements which isn't necessarily what we want even if there was only one element in the collection it would still return a collection and I've said I've declared variables in HTML element not a collection of those so neat little trick we can use is right at the end of the get elements by class name we can open up a set of parentheses and refer to the first element in that collection it's a zero-based index for the element collection so we enter a 0 in parentheses that will return the first div tag or the verse the first element with that class into that variable once I can now do is change which object or which element I'm applying this method to so rather than trying to get all the a tags from the entire HTML document I can instead just copy and paste video div and replace HTML doc so what this will do is you'll get all the elements with a 8 with an a tag inside that video div so let me just clear the contents of the immediate window now and then run the subroutine one more time will hopefully find we got a lot so big fun I forgot to run the one above that one the load page let's run the one above we should find far fewer links just three this time so that's a much shorter loop to loop over so we've certainly made this loop a lot more efficient only looping over three elements rather than 132 of course but we still got the problem that we need to pick out the unique element that we're interested in so one way we could uniquely identify the element is using its title attribute so cycle attribute has the text video files download now is fairly likely that's going to be unique amongst those three tags so let's head back to the VB editor and see we can test for that so inside the for each loop we could write an if statement that checks if link gets attribute open some parentheses and double quotes title equals them in some double quotes we can say video files download then we'll print out the information we've asked for previously it's worth bearing in mind that string comparisons in VBA are of course case sensitive so if you wants to eliminate the possibility of getting that casing incorrect we could replace the string with all lowercase text and then convert the result of the get attribute function to L case or lower case of course and that will eliminate the the possibility of getting things mixed up based on case so a debug printing the inner text and the href attribute and then say end if it's also worthwhile just adding one more little line of code inside the if statement if we've established the link we're looking at is the one that we want it's pointless continuing with this for each loop to process the remaining links so what I'm going to do here is say exit four and what that will mean is that once we reach this in at the next link it will mean that the link variable will hold a reference to the specific element that were interested in so let's just clear out the contents of the immediate window and then let's go back up to the top and what we're going to do is run the entire subroutine again or pair of subroutines and this time although we're looping over three elements we're only printing out information for one of them and it happens to be exactly the one that we want it's also worthwhile thinking about what happens if we get to the end of the loop and we haven't found the link for the file it's certainly possible that some videos don't have a file to download that goes along with them so one simplistic way to handle that will be to declare a new variable let's just say something like dim file link found as boolean and what we can do before we start looping is set that variable equal to false so in say file link found equals false and then inside the loop what we can do is if we do establish that we have found the link for the file before we exit the for we can say file link found equals true finally at the end of the loop after we finished looping through all of the possible tags if we find that we haven't set the file link found variable to true that means we haven't found the file so we can check if not file link found spell it properly file link found we can then simply say exit sub so we do that in a single line if we could also debug print I suppose to give us a bit more information so rather than just exiting the sub we could provide a bit of diagnostic information let's say debug print file no file for this video something along those lines no file for this video okay we'll have an end if at that point so that will handle what what should be done if we establish that there isn't a file to download at all now we need to pay a bit of attention to the URL that we've returned from the href attribute because it's not quite in a valid format for passing into our URLs download a file function we need to get rid of this about colon parts and replace it with the root of the Y's our website so www is a bucket at UK or with the HTTP part and there start as well to create a fully formed URL that can be used to download a file so where we do this is pretty flexible actually we could do it inside our if statement inside our for each loop because that will only happen if we found a link that is the video file download link or we could wait until after we've gone through our little bit of validation and exited the subroutine if the file link wasn't found just to separate the bits of code out to make it a little bit more clean and readable I'm going to go back up to the top and just declare a new variable which is going to hold the file URL so you say dim file URL as string and then what I'm going to do is head back down to just below this if statement and then look at how we can construct that URL just to make my life easier I'm going to start by setting the file URL variable to be equal to the link dot get attribute href so having two nuts I know that that's what I've got stored in there at the moment the next part is to find the position of the colon character in there and then return all the string of text after that part so we're using the same technique that we've used in the previous video if you remember that so you can probably skip ahead at this point so file URL equals we're going to use the mid function so I'm gonna get characters from the middle of the file URL and the character that I want to begin at in this case is the position of the colon character plus one so I can use the in stro in string function to do that so I can say instra file URL and then I'm looking for a colon character and then close the parentheses and then say plus one and what I want to do then is return all the text to the end of the string so I could pass in a length that was equal to the entire length of string or I can just miss that promise out altogether so I just close the parentheses at that point and then I'll just give that a quick debug print so you can see what we end up with by sorry big one nearly finished debug to print a file URL there we go myself I just then clear out the contents of the immediate window and then head back to the previous subroutine up at the top and then give that one a quick run we want to find that at that point we've got everything after the colon character from the files URL all that we need to do now is concatenate to the start of that the root of the Y's l website so just going back to the web browser then the root of the Y's on pages HTTP colon forward slash forward slash www-why is a loco dot uk' so I'm just going to copy that to the clipboard I'm probably likely to use that part of the URL quite a lot in this system where I'm getting stuff from the Y's our website so it might be worthwhile at this point scrolling back up to the top into the declaration section of my module and before I start any subroutines I can say well this this isn't going to change it's going to be a constant so I'm going to declare a public Const wall URL as string equals and then inside I set up double quotes paste in that string so what I can now do as long as I put in a capital l otherwise again I won't sleep tonight tuple URL and then what I can then do is write down at the bottom again just before I debug print file URL I can say file URL equals wall URL can't even spell that now while you are l ampersand file URL so one last time just to run the entire thing if I clear the contents of the immediate window just to make sure we're getting the right results if I run this one one more time we'll end up with a fully formed URL to get that file downloaded okay so what I'd like to do now is create a little system that will download that file using the URL that I've just constructed so to do that I'm going to create yet another sublet in that I'm going to pass this URL in to and actually I'm going to copy and paste one of the very early ones that we wrote so right up towards the top the sub that downloads a single file if I just take a copy of this because the structure this is essentially identical to what we're about to do now scroll all the way back down to the bottom and then paste that in I'm going to change the name of the subroutine so that it's got a unique name its download file from page should we say something like that and then what I'm going to do is convert the URL variable into a parameter so I have to pass a value into the subroutine in order to launch it so I can just take that part and place it into the parentheses in the subroutines definition I'll just take away that variable then now in order to call this one from the previous routine all I'll have to do is say download file from page and then pass in the file URL that I've just constructed let me just comment out the debug print statement in fact I'm going to go back up and comment out a couple of the debug print statements so we don't fill up the immediate window with all sorts of junk now there's a few things we'll have to do to the file URL that we pass in just to get their final write results but just a very very quick simple test let me just get rid of the line which tries to construct the file URL inside this subroutine and then we'll spit out the destination file to the same VBA download zip so the same generic name as we had earlier on we're still downloading the file URL I'm downloading it to the destination file so currently that folder should be empty let's just scroll all the way back up to the original load web page and then if I run that entire thing and then just have a very quick look back in the VBA editor we've now got the same file downloaded as we have before now this generic filename is absolutely fine when we've only got a single file to download but later on in a video and we've got lots of files we'll want to each one to have a unique name probably based on the original source file name so let's just head back into the VB editor and scrolling back down again I'm just going to comment out another couple of these debug print statements as I go in fact I'm going to delete one or two of them altogether so printing the link length that's absolutely redundant now let's get rid of that and I'm just going to get rid of the inner text and Link attribute as well so let's just comment that one out for the time being I accidentally commented out the one that was an error message I'm going to bring that one back in suit me sorry about that and then that one can remain commented out so back down into the super team where we're generating the destination file what I want to do this time is get first of all all the characters to the right of the last forward slash character in the file URL so let's just add a couple of quick petrol lines here destination file equals mid so just use the mid function to find the position of the first cold on character now I'm going to use the same thing to find the position of the last forward slash character so to do that I'm going to say mid file URL comma we use the instruction to search from the left of a string we can use the instruction to search from the right of a string so I'm gonna say instra Rev then open up some parentheses I'm going to look inside the file URL and I'm going to try to find a forward slash character and then if I have a quick look at the results of that by again debug printing of course let's say debug print destination file get that right in a second there we go and give this one a quick run right in the very very top so let me just clear out the contents of the immediate window and then give this one a quick test so there we go so I've got everything to the right of the including the other four slash carriage that's quite important next job I don't want my filenames contain these percent to zero symbols I want them to be replaced with spaces so once again back down to the subroutine right down on the bottom and what I'll then do is say destination file equals and then I'm going to use the replace function to replace within the destination file any % to 0 character sequences and I want to replace all of those with a space so again at that point if I just clear the contents of the immediate window and head back up to the very very top and then run this one one more time I will find that I've now got a file name it looks pretty sensible and it's based on the unique file name from the web page so all that remains is to concatenate to that the C VBA part so let's just move the in fact let's just get rid of the debug dot print statement all together and if I simply now say destination file equals C VBA ampersand destination file that will give me the correct destination and sensible file name to download that file so currently this file this folder should just have a whole bunch of overwritten copies of VBA download but this time if I scroll all the way back up to the top and run the load web page routine one more time I will hopefully find in the VBA folder I've now got a sensibly named file with spaces in place as those silly percent to zeros now it's actually great that we've got this file downloaded because this is the starting point for the next part of the video so I'm going to get you to right click on the zip file and then choose extract all which will just extract it to a new destination I'm just going to uncheck the show attractive files box and if I click the extract button that will give me a new folder with the extracted version of the file I can then double click into that folder and then to pull it into the folder in there and then finally we've got a file in there called scraping multiple web pages I'm going to give that a quick double click to open it and I'll get a security warning about enabling macros how well do you trust me I'm going to click anymore macros I know that I wrote that one so it's fairly safe and then having opened that there's not much in the workbook itself putting the VP editor there's a whole bunch of modules or at least one module that's got some useful code in it so if I have a quick look in the scraping multi the multiple pages there's a ton of stuff in here that essentially loops over a whole range of different video pages and print out the name of each video and the URL of the page which contains the video if you follow through with the video that we created that in which we built up this you might have your own copy of this if not feel free to use the one that I've just downloaded here what I'm then gonna do is I want to copy of all this code in my in the project that I'm working on so the simplest way by far to do that is just to click and drag module 1 from the scraping multiple pages project and simply drag that straight into the book 1 project so I'll get a copy of that'll be renamed to have a unique name so it gives it module 11 and then I can happily just close down the scraping multiple pages this is really useful now because there's a whole bunch of code in this new module that we've just copied that we're going to use to essentially download every video file every every file associate with every video on the wise owl website video pages just to give you a quick idea about what this code does first of all though I'm going to close down the immediate window head to this new module 11 and when that does open I've got a sequence of subroutine so get video page first of all which essentially loops through the categories of videos on the Y's L video home page I can just quickly show you that in fact that might make sense too because of the videos home page this loops through all the categories on the left hand side and then for each one of those categories what happens is it loads the page related to that category before moving on to the next procedure so imagine we found the vba category we'll go on to their first of all and then the next procedure loops through all the videos on that page and it does that with a little bit of extra code to handle if the page has paging controls so right down the bottom of the VBA videos page there are some paging controls because there are so many VBA videos so it loops through all of those and then on each page that it goes to it will loop through all the little grey boxes for each little little grey boxes it copies out the name of the the video that links to and also the H attribute for that link and it ends up with a brand new worksheet for each individual video category so just to give you a quick look at the results of that we're in a workbook with a single blank worksheet currently if I go into module number 11 and then simply click into the first subroutine and then start to run it we ought to end up with a whole range of worksheets which should eventually appear quite slow at this point but eventually we'll end up with a whole range of different worksheets if I go back into Excel we'll see that each sheet contains a list of videos and URLs for each individual video category and this is really useful because what we're going to do is use this as the basis to browse to each of these video pages and then essentially run the same code that we've just used to pick out the video file for a single video page so let's just write a quick routine first of all that's going to tidy up all these worksheets we'll run this a few times just to test it and a quick way to get rid of all these sheets so back into the VB editor first of all and let's have a new little module for this a separate little utility module I won't spend too much time on this is just meant as a quick little help a method so I'm just gonna call this delete all sheets but she one just to give it a nice descriptive name and then very quickly dim WS as worksheet and then say for each WS excuse me I'll spell each properly second and the second for each WS in this workbook dots worksheets then a quick next ws and then inside the loop all I'm going to do is say ws dots and delete I want to make sure that happens without the applications alert message disappearing because I'm going to say application dot display alerts equals false and then at that point I can happily run that one to get rid of all the sheets apart from sheet one except I haven't added in the if statement yet have I I want to make sure that I only do that for every sheet except she one so I'm going to say if not and ws is sheet 1 so I'll use the code name of the object then WS dot delete ok so just running that one really quickly I'll end up with all the cheese disappeared except for sheet 1 I'm also just going to quickly rename some of these modules as well so the moduli min right now probably won't have much more to this one but I'm gonna call it em underscore utility and then I'm going to call the first module up here M underscore download files and then finally this one that I just copied in I'm going to call this one M underscore process video pages so just give us a bit of an idea about what's in each module the next thing I'd like to do is make quick modification to one of the previous subroutines we've written in this video so it's in the download files module if I just double click back on that one and the one that I want to modify is load web page so basically I want to be able to trigger this one for each video page that my main loop on in the process video pages module processes so I'm going to make sure that I can pass in the video page URL essential I'm just going to convert the vid page URL from a variable into a parameter just by selecting and clicking and dragging and then I can happily just delete the remaining dim and the line that sets the webpage URL to a specific page so what that one will then do is load the video page whichever URL is passed in and then trigger the rest of the sequence as we've just seen the point at which this load web page routine is going to be triggered is back in this process video pages module so I head back into there and then scroll all the way down to pretty much the very very end of this so this is little part of the loop which is processing each individual video on the page so going back to the web browser this is processing each of these individual gray boxes and for each one of these we could access to a hyperlink to take you to the video page so back in the VB editor just before we move down to the next row so what of course we're listing out the information about each video just before we move down to the next row in the worksheet I'm going to say load web page followed by space and the video page URL that I want to pass in is already stored in a variable here called vid URL so we've already extracted that using code in the previous video so let's say load vid page sorry load web page vid URL so what we should need to do at this point is give the entire thing a quick test we ought to get all the video files downloaded to that same single folder so let me just cut back into the vba folder first and let's just give this a quick tidy up if I select VBA and then ctrl a to select all and then delete them all then back to the VB editor scroll all the way up to the top of the new module we've copied in the process video pages then all we should really need to do is run this get video page routines if I hit the Run button to do that may take a little while for this to process but we should start to see these new worksheets appearing although they show off as modules weirdly most of the the project explosion refresh very quickly but anyway eventually we'll end up with a list of all of the same pages as we've just seen and then we should also find that in the VBA folder we've got a list of all of the video files that belong to any of those video pages that's pretty impressive that's reasonably impressive but there's a whole bunch of ways we could sort of tart this up and make it look a little bit more user-friendly so first of all let's just get rid of everything that's in this folder already I'm gonna select press control a to select everything and then hit delete and then head back to the VB editor and back to my little utility module and I'm just going to run my delete or she but she one routine and then what I'm going to do next is head back into the process video pages module right at the very very top here what we're going to do is at the simple little controlling routine so one that's going to be used to trigger off the whole process but with a few extra bells and whistles a couple of nice additional things I'd like to be able to do are want to give the user the choice for which folder to download the files to and second of all inside that folder to create a separate sub folder for each video category so to achieve that let's start by declaring a couple of extra variables up at the top I'm going to declare a public string variable called main download folder and I'll set that as a string of course and then I'm going to say public vid cat folder which is also going to be a string then I'm going to create a main method that's going to control this entire process so I'm going to say sub download wall video files and then inside there I'm going to say main download folder just for the moment as a main download folder equals C colon backslash VBA so I'll just set it to that fixed folder path to begin with now the video category folder we can set that when we generate a new video category so if we scroll down a little bit further into the first sub routine what was the first so we didn't get video page there's a little section here there's looping over all of the video categories in the video ATS collection so what I'm going to do is before I say call list of videos on page I'm going to say vid cats folder equals vid cat and dot in a text that's the same property that I'm sorry in a text not in a in HTML so that's the same property that I'm using there to state the video category names that's what labels each worksheet were there with the correct name so I've got those two folder names generated what I need to do now is work out how to generate those folders in the part where I'm copying the files next we need to head back into the download files modules let's give that a quick double click and then let's head all the way down to the bottom to the subroutine that deals with actually copying the files of their final destination I'd like to have a separate variable to hold the destination folder path before we concatenate everything together so let's say dim destination folder and we'll set that as a string of course then I'm just going to concatenate that out of its two component parts so destination folder will be equal to main download folder ampersand a backslash ampersand the vid cat folder now I need to establish whether that folder exists or not and if it doesn't I will need to create it now for me by far and away the most convenient way to do that is using the Microsoft scripting runtime library we've used this in a few of the previous videos in the series but just in case you're not familiar with it what I'm going to get you to do is head to the Tools menu and choose references and then if you scroll down far enough you'll find the Microsoft scripting runtime which doesn't really sound as though it's got much to do with generating files and folders but it's only does so Microsoft scripting runtime if you check the box next to that and then click OK what we can do in this download file from page subroutine is declare a variable which I'm going to call FSO as a new scripting dot file system object once i've generated the destination folder the first thing we're going to do is check if that folder exists now for the time being I'm going to assume that the main download folder exists all the time that's going to be a fairly safe assumption as you'll see when we finish off the remaining part of this section but I'm going to check if not FSO dots folder exists then opens in parentheses destination folder close the parentheses then all I would like to do is say FSO dot create folder and then just pass in the same destination folder path it couldn't be simpler really hopefully you can see why I like this little object library so much the the filesystem object object is absolutely wonderful for manipulating files and folders so if I've created that folder then I can guarantee that exists so that when I set the destination file rather than it just being the file name and this root folder cease colon slash of VBA I can now pass in destination folder and then the destination file contains the backslash character so that should be absolutely fine at that point I know it's big fun it doesn't contain the backslash character at all I'll need to concatenate that in so let's make sure we've got this extra backslash character concatenated in otherwise that would've been a bit of a disaster okay so just to give this new little system a quick test what we'll do is head back into the process video pages module and then right up at the very very top after I've set the main download folder I'm then just going to simply call and get video page and if I then just trigger that entire sequence by running this main subroutine everything should start looping through again and apologies is going to take a little while to do but eventually those worksheets will appear we just ignore the not responding message for a few seconds we will eventually get there and then if I look back in the vba folder what I should find this time is I've got a whole range of different folders available so you can see that I've got a folder for Excel VBA a folder for power bi tools and a folder for VBA user forms and so on and so on and so on so each of the video pages that actually do have files to download have got their own separate folder to organize them a bit more neatly so that's pretty cool as well let's just copy all these or select all these and then just delete them again and then there's a few extra little bits and bobs that I'd like to do to wrap up this video I'd like to give the user a choice now about which folder they use is the root download folder so I'm going to use a file dialog folder picker to make that work I'll separate this out into another separate procedure but rather than a subroutine this I'm going to declare a function so I'm going to say function choose download folder the reason I want to use this great this is a function is because it's going to return the string which is the name of the folder that users chosen or if they choose to abandon the process it's going to return an empty string so what I'm going to do up here now is going to say set or main download folder equals choose download folder so what that will do is store the results in the main download folder variable the function itself is felt straightforward again I've got a video on using file picker and folder picker dialog boxes as part of the rest of the series so just to go through the basics fairly quickly will declare a variable called FD as a file dialog and then I'm also going to declare a variable called result as an integer so result as integer what I'm then going to do sorry I can't even spell integer at this point it is getting fairly late I suppose so dim result as integer okay so once I've done that I'm going to set FD equals application dot file dialog and then open some parentheses and the particular one that I'm going to choose is the folder picker which as the name suggests allows you to pick a folder so there's very little hard work involved here there's no need to reinvent the wheel when you want users to pick folders there are built-in dialog boxes to help them to do this we can customize them after a fashion so there's a few basic things we can do I'm gonna say FD title equals choose main download folder you can very much make this say whatever you like so maybe just bear in mind your target audience and then let's say you FD dot button name we can change the text on the button as well we can make that equal to I've no download to this folder something along those lines and then say what else we can set the initial file name so we can point the the dialog box to a particular folder to begin with what a nice thing to do here is point the dialog box to the users downloads folder so a neat way to do that is to use the environ function so I say environ open some put under-seasoned double quotes and then refer to the user profile that will give you the path to the root of the user profile folder and then you can concatenate to the end of that any other folder you like so it might be documents for instance so you could say as use me documents or it could be in this case I go for downloads and then close the double quotes what I then want to do is show the dialog box on the screen and I'd like to use that method to return the result as well as just so we can check what the users picked sung as a result equals FD Show so that will make it appear on screen then the users got a choice of things they can do them they can pick a folder and then click the select button or the main action button or they could click cancel or they could just close the dialog box down without picking anything at all so if I I'm going to say if result equals zero so if they haven't picked anything and they cancel or they just closed the dialer box and then what I'd like to do is say choose download folder equals an empty string and return that as the result else I would like to excuse me else I would like to say choose download folder equals FD selected items and then this is a one based index unlike the elements of HTML which is a zero based index so selected items one will be the first item that the user has selected in the list a folder picker only allows you to select one item anyway so that's the only thing we could go with here then I'll say end if and that will return the correct result back up to the main download folder variable okay so all we should do now at this point is check what the result of that method was so rather than saying just get video page will check if main download folder equals an empty string then we should probably have a message box that says no you didn't pick a download folder something like that and then something like sorry download folder back run let's have a VB exclamation symbol on there as well and then we can make sure we exit the sub so that we don't trigger the rest of the sequence okay so just to give that a really quick test let's just run this main subroutine and will see the folder picker pops up and if I do click cancel we'll just close down the dialog box then it says I didn't pick a download folder so nothing I'll just quickly tidy up again from the utility routine let's just go back and run delete all sheets but she won them back to processed webpages and if I just run this over it in one more time and I will indeed pick a sensible fold this and let's go back to the route by C Drive and I'm gonna pick my let's see I'm going to pick my VBA folder again why not and then click download to this folder and then the whole process goes round again and sorry to make you sit through one more time but we're nearly there we're going to just add one last final little flourish just to avoid having to open up this folder ourselves so having done that we've got all the same files in there again I'll just select and delete them one more time and then let's just have one final line of code right to the very very end of our subroutine just as one final flourish then what we're going to do is add a line of code that opens up the main download folder after everything has finished downloading so I'm just going to manually close down the VBA folder and I'll know how to reopen that one again and then right at the bottom of the main subroutine here that triggers the entire process we're going to use the shell function now the shell function just lets you put in the name of an executive program that you'd like to run so the one that I'm going to use in this case is going to be Explorer so that's the name of the program exe and then I'd like to concatenate to the end of that the path of my main download folder so I'm going to say amps and main download folder and then I'm going to type in a comma and there's an extra parameter you can use to specify how the window appears I'm going to say VB normal focus which will bring it into focus or bring it to the front so having done all that let me just tidy up one more time back in the utility routine and get rid of all my sheets go back to the process of web pages module give it a quick save and run the main subroutine from scratch choose a download folder let's stick it back on of course that same VBA folder do you feel free to choose a different folder each time by the way you just want to test that that works and then once again it's going to go round and round around and hopefully it won't take too long I should have shown over bloah to so many files and made so many fantastic videos should i sorry about that there we go finally we get the VBA folder opened up showing goes all the files we've downloaded so hopefully if you're following this video through you're gonna see some somewhat more folders in that over the course of time we're going to get more files uploaded and of course more videos uploaded in different categories as well hopefully that's given you a decent insight into how you can get files downloaded from webpages it ended up being quite a small part the video didn't it the downloading part a lot of that was to do with processing webpages and a bit more web scraping the techniques the downloading files part is pretty straightforward as long as you've got this declaration for the URL download to file function and from this point on if you've written it out once you'll never have to write it out ever again just copy and paste it please don't don't try to write it all out again from scratch every single time anyway hopefully that's given you a few ideas for some useful things you can apply those techniques to thanks for sitting through this one see you next time if you like what you've seen here why not head over to the wise our website where you can find those more free resources including these videos some written blogs and tutorials needing some exercises that you can download to practice your skills thanks for watching see you next time
Info
Channel: WiseOwlTutorials
Views: 73,951
Rating: 4.9359536 out of 5
Keywords: excel vba, visual basic for applications, vba, urldownloadtofile, download file, website, web scraping, file download, api, api function, windows api, ptrsafe, wise owl, html, xmlhttp, mshtml, msxml, filedialog, folder picker, shell, explorer, filesystemobject, create folder
Id: JPezrWwvsJM
Channel Id: undefined
Length: 69min 12sec (4152 seconds)
Published: Fri Nov 18 2016
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.