Wise Owl Answers - How do I get data from multiple closed Excel files using VBA?

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
this is a follow-up to a previous answer video i made which explained how to get data out of a closed workbook without having to open up the file in that video i showed how to connect to one single workbook and extract different sets of data from it in this video i want to go a step further and look at how to get data from multiple closed workbooks without having to open up a single file to get started i've got an almost blank workbook it's got some column headings in sheet1 and this is going to be where we'll paste all of the data from the closed files we'll loop over i've saved it as a macro enabled workbook and it's saved in a folder along with a subfolder called my files and that's the one that contains all of the files we'll get the data from each of these files has the same basic structure if i just open up one of them to show you what's in there it's the same column headings in the same order as our main worksheet but the the number of rows is different in each of the individual files the worksheet name is the same it's sheet1 in each file so to get that to work let's head into the visual basic editor of our main workbook and i will insert a new module into the project and begin a new subroutine called something like get data from multiple files the techniques we're going to use here rely on the microsoft activex data objects library so before we add any more code let's head up to the tools menu and choose references and then we can scroll through the list to find microsoft activex data objects you should find several versions of that library installed i'm going to check the box next to the latest version of the library i have available 6.1 and then having done that i can click ok and now i have access to a bunch of extra classes defined in that library which will help me connect to those excel workbooks the first object i'm going to create the first variable i'll declare i'm going to say dim cn as adodb.connection so that represents the the connection to the files that we're going to create one by one and i'm going to create a new instance of that connection object by saying set cn equals new adodb connection next i'd like to connect to a single file in the my files folder let's use movies 2016 to demonstrate to connect to that file i need to know which connection string to use the connection string is the most important property of a connection object it contains information such as what the data source type is where the file is located any credentials you might need to pass etc so to set it in vba i can say cn.connectionstring equals and then open and close a set of double quotes now if you're not sure how to write a connection string from scratch and to be honest i rarely do that myself let's head over to the connectionstrings.com website which allows us to find the connection strings for any data source we could think of you'll get an idea as to how flexible activex data objects are you can connect to almost any data source type you can think of using this same technique providing you have the correct connection string so we're going to go for an excel connection let's click on the excel link over here then we need to pick a provider or driver i'm going to go with microsoft ace oledb the access database engine if i click on that i then get a filtered list of connection strings i could use for various versions of excel so it's important of course to find a relevant version of excel if i start scrolling through the list i find that i think the last version listed here on this page is excel 2013. but the one listed for 2013 is actually the same for excel 2010 and for 2007 as well the connection string hasn't really changed since the xlsx file extension was introduced in office 2007. so at this point i'm just going to copy this connection string head back to the visual basic editor and then paste that in inside the double quotes i've got a little work to do just to tidy this up and make it readable so i'm going to start by splitting this onto multiple different lines let's use a space underscore character so that we can take this string down to the next line and then i'm going to identify where the semi-colon characters are that's the end of one particular property defined in the connection string so the first thing is the provider and then i'll type in a closed double quotes there and then concatenate that to a new line beginning a new set of double quotes the next thing is the data source we need to change that as well we'll sort that out in a moment but find the next semicolon character close a double quotes and then a concatenated concatenation character i should say an ampersand and then a space underscore before we head down to the next line now everything here is specified in a single property so extended properties tells us that we're connecting to a particular version of excel and a particular file type so that's one other thing that's quite important i'm connecting to an xlsx file there are subtly different connection strings for xlsx xlsb and xlsm so do make sure you get the right one so the other thing it specifies is that there are header rows or there is a header row in each file as well and there's one small problem with the extended properties those are passed in as a string within the greater connection string so you can see i've got a set of double quotes nested inside a set of double quotes and that's always a bad thing in vba because you get a syntax error when you try to enter it so i've got a couple of choices here i can either double up my double quotes within the larger string and that will work although it's potentially a bit confusing to read so let's change it to single quotes instead that's also a way to do it and i think that looks a little better now let's sort out the file path for the data source property of the connection string just to help me out later on i'm going to store the path to the my files folder in a variable so let's declare that variable first let's say dim my files path as a string and then after my list of variable declarations let's say my files path equals this workbook.path and then concatenate to that a backslash and the name of the my files folder and another backslash then i can place that inside the data source property of the connection string so i'm going to basically get rid of all of this up to the full stop before the file name extension i'm going to backspace that and i'm going to type in a double quote and then concatenate the my files path variable and then concatenate the extension name as well as the name of the single file we're going to connect to which was movies 2016. so in this first example of course we're just going to to connect to one single excel workbook movie 2016 just make sure i've got that spelled correctly and then what we can do is test that that connection is going to work so a simple way to do that is to attempt to open the connection to the file i'll provide a few blank lines where we'll fill in some interesting code shortly and then once i've tested that it's opened i'm going to then close the connection so hopefully nothing will happen or nothing obvious will happen when i run this subroutine if i step through it using the f8 key if nothing happens that's a good sign when i execute this line excellent the connection is opened and if i close it again and then end the subroutine if anything was wrong with the connection string or anything was wrong with a file then this line would have thrown a runtime error but everything's working correctly so we're good to move on to the next stage next i'd like to try to retrieve some data from the file we just connected to and to do that we need a container to hold all the information we retrieve and that container is going to be a record set object so let's declare a new variable to hold a reference to that record set we'll say dim rs as adodb.record set after i've created my new connection i'm then going to create a new instance of the record set class so we'll say set rs equals new adodb dot record set and then after i've opened up the connection i'm going to open up the record set by saying rs.open now there are a few properties of the record set object i need to specify for the open method so if i type in a space the two parameters we're going to set a value for are the source and the active connection let's set these on separate lines i'll use space underscore and continue typing on the next line i'm going to name my parameters just to help me understand what's going on here so source colon equals and it's at this point where if you know some sql some select statements you're at a big advantage because the source is essentially going to select all of the columns from the worksheet in that workbook and we're going to do that in a set of double quotes using a select statement select star or select asterisk the asterisk there represents all of the columns in the table we're about to reference and we reference the table by saying from and then for an excel worksheet in some square brackets we can say sheet 1 followed by a dollar sign we can then close the square brackets and close the double quotes so in this example all of the workbooks have a single worksheet with the same name sheet1 so that will work nicely we'll type in a comma followed by a space underscore and then we're going to set the active connection parameter as well so activeconnection colon equals but we only have one available it's called cn once again just before we test this i'm going to give myself a few blank lines and then make sure that we close the record set afterwards and just to check that this is actually working let's use the f8 key to step through we know the connection works and that opens up does the record set open up as well well nothing went wrong so i'm going to assume that it did we can investigate some of the record set properties in the locals window if you like but i'm going to just close down the record set close down the connection and then end the subroutine the final step for this quick test is to write the contents of the record set out into the worksheet before we close it this is the easiest part of the whole process if i simply refer to the top left hand corner cell where i want my data to be pasted that's range a2 on sheet1 i can use a specific method to copy data from a record set so let's say sheet1 dot range a2 dot copy from record set and there's a single parameter that i need to specify for this method it's called uh data as unknown and it wants a reference to a record set object so i can simply pass in my rs variable so at this point if i run the subroutine we'll find sitting from cell a2 downwards the list of all of the films contained in sheet 1 of the movies 2016 workbook just before we carry on i'm going to tidy things up a little bit of course in a moment we're going to try to construct a loop to build up a list of films and i don't want to just continue pasting into cell a2 i want to make sure that i paste into the next available blank cell at the end of the list and the technique i'm going to use to do that is to start from the bottom of the worksheet and exhale upwards and then offset one cell further down from there so to do that i'm going to head back to the visual basic editor and rather than saying range a2 i'm going to change that to say range a 1000 48 576 and then before i copy from the record set i'll say dot end excel up and then i'll say dot offset one comma zero dots copy from record set rs and that will guarantee my list will be built up step by step i'd also like to test this subroutine a few times and i want to make sure that i get rid of any existing data from sheet 1 before i begin so to do that i'm going to head up to the top of the subroutine and then add a line of code which says sheet1 dot range a1 and then i'm going to refer to the current region property so you can imagine the current region if i have cell a1 selected if i hit control a that represents the current region i then want to shift that region down by one row and then clear everything from the remaining cells so i don't want to delete or get rid of the row headings or column headings i should say so to do that current region dot offset one comma zero and then i can simply say dot clear to get rid of everything having done that i can run this subroutine as many times as i like now and i'll just continually end up with that list of films from the 2016 worksheet but that sets us up for an easier life when we write our loop to process all of the other files in that folder now we need to add in the code that will loop through all of the excel files in the my files folder and substitute in the file name for each file that we encounter there are several ways we could do this i'm going to rely on the dir or the directory function and to do that i'm going to use another variable to capture the name of the file that we're currently looking at so let's say dim movie file name a string and then after i've set my files path i'm going to say my sorry movie file name to find the first file movie file name equals dir and then open some round brackets where i can specify the path name now again there's a few different techniques i can use here i've covered this in previous videos so i won't go into too much detail but i want to make sure that i'm looking at excel files in this particular folder so i'm going to say my files path and then concatenate to that a search string which has an asterisk a wildcard character dot xlsx so that will find any excel file or any xlsx file i should say in the folder stored in the my files path variable so once i've done that i'm just going to check that this part works by concatenating the movie file name into my my existing connection string so to make that work let's get rid of the current file name movies2016.xlsx i need to leave in the semicolon character there so i'll leave that concatenated to the end of that line and then after my files path i can concatenate movie file name so at this point we're still only working with a single file but more dynamically now because we're finding the first file in that folder so rather than finding movies 2016 this time we should find movies 2011. so if i run that subroutine again the list of movies on the on the sheet now should come from 2011 so that parts working quite well okay now i just need to continue finding xlsx files until i run out of files in the folder so to do that let's add a loop just above the connection string line i'm going to say do until movie file name equals an empty string so when the directory function or the dir function fails to find another file it returns an empty string so i'm going to carry on repeating this set of instructions until that happens i'm then just going to highlight all of the lines of code down to the cn.close line and then tab that in one space and after i've closed down the connection to the file that we've just copied the data from i'm then going to try to capture the next excel file by saying movie file name equals dir so if we call the directory function again without passing any values to its parameters it essentially continues the original search but at the next file so at that point i can say loop back to the top and it will continue to go around until we run out of excel files so at that point let's just give the entire system a quick test just to check i've currently only got the 2011 movies in that workbook if i run the subroutine at this stage it should fairly rapidly get through the entire list of files i appreciate there are only six with small volumes of data in them but still it's pretty quick and there we go all the movies from 2011 to 2016. so there we go there's the core part of the system working we've extracted all of the data from multiple closed excel workbooks without needing to open up a single file but from this point on we can get a bit more creative about exactly which set of data we retrieve from each file if you know much about how select statements work in sql you may already have some ideas here about how we could alter the list of data we retrieve by adding a where clause to the select statement so as a first quick simple example let's say we wanted to retrieve only those films where the oscar wins value was greater than zero so we only want to see the films that won an oscar to do that in the select statement we can add a where clause to the end so we're going to say where and then we can say oscar wins now to refer to the column name we need to be careful to check whether there's a space in the column name if there is a space in the column name we need to enclose the column name in some square brackets you can always enclose column names in square brackets so as a matter of course you might prefer to do this anyway but in this case it's absolutely necessary so oscar wins greater than zero having done that let's run the subroutine again and we should find a much smaller list this time this time only the oscar-winning films from that collection of files we can add multiple criteria to the same query so let's say we wanted to show only the oscar-winning animation films so we could also check the contents or the value of the genre field to do that we can head back to the vb editor and to add another criterion to the end of the where clause we need to decide whether it's an or or an and criteria just like in vba and all criteria if either of the conditions is met then the answer is true with an and condition only if both conditions are met is the answer true so in this case it's definitely an and i want to see only the oscar-winning animation films so i'll say and genre equals and then to enter some text just like we did for the extended properties in the connection string we'll write wrap these in some single quotes so animation having done that i can run the subroutine again and this time we'll end up with an even shorter list of the three oscar-winning animation films from that collection of files so at this point it's really all down to how much you know about sql queries in terms of how far you can go just to give you one final idea i'm going to switch back to bringing back all the movies so i'm just going to take away the where clause from my select statement all together so it's back to select everything from 1 then i can run that subroutine to retrieve the full list of films finally i'd like to write one criteria which excludes any film which has got the word twilight in its title and to do that we can head back to the vb editor add our where clause and i'm going to say where title not like and then in some single quotes contained within a set of percentage symbols which act as the wild card characters in sql the word twilight so at the moment we have got the twilight films in the title column unfortunately but when i run this subroutine now i'll find the very satisfyingly now the twilight films have disappeared so if you'd like to learn a bit more about how sql queries work and you two want to eliminate the twilight movies from your life you can have a quick look at this sql server queries playlist that teaches you everything you need to know about writing basic sql queries gets onto some fairly complex queries by the end if you don't have a sql server to play with then you may prefer to stick to vba and we've got a few videos which explain how to use ado with microsoft access so there's a few extra hints and tips about writing queries in there as well but anyway i hope you found that one useful feel free to ask any questions on that and i'll do my best to answer them thanks very much for watching see you next time you
Info
Channel: WiseOwlTutorials
Views: 2,718
Rating: 5 out of 5
Keywords: vba, visual basic for applications, excel, microsoft, macro, macros, ado, adodb, activex data objects, connection, recordset, connection string, database, closed workbook, closed file, copy paste, tutorial, training course, online tutorial, online training, free course, free tutorial, free training
Id: Q1zhMFAWgg0
Channel Id: undefined
Length: 20min 36sec (1236 seconds)
Published: Sun Apr 11 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.