Combine all the files in a folder using Power Query | Excel & CSV | Excel Off The Grid

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
in this video we're looking at two methods that we can use to import all the files in a folder into Power query so that we end up with a single table so we can put lots of data files together into a folder and power query just combines them together it's like magic and if we add new data files to that folder they get included in that query too now there's one thing to be aware of that if we use this method there does need to be some level of consistency between those files power query is Magic but it can't achieve the impossible now in this video we are going to look at CSV and Excel files and two methods to deal with them they're probably the most common file types that we encounter as Excel users so if you're ready let's get started if you want to work along with this video then you can download the example files and you can find links for that in the descriptions box below in the downloads folder you'll find eight files for CSV and four Excel files for January February March and April 2019. initially we're going to start by just looking at the CSV files CSV files are easier to work with because they only contain one set of data they only or should only have one header row and they don't allow any merged cells so to begin with selected January and the February files and let's move them into the import folder to see what any of these files look like you can see that I have the January One open here we have date customer product sold by and value and we want to combine all of these together into a single table so let me close that and then I want to move those files into my import folder this first method we're going to combine the CSV files together so in Excel in a brand new workbook we can go to data get data from file from folder if you then navigate to the folder where those files are saved mine's called import folder and then click open power query then gives us a preview list of those files we want to click on the transform data button okay that has now loaded the main power query preview window and we can see the four files listed that were in our folder initially we just want to work with the CSV files so the extension column selected when we go to transform and then format and then make sure that all of those are lowercase this means that if we want to work with files that have any uppercase or lowercase characters in the extension that they'll still work correctly then I want to filter this column so that it only includes CSV files so on the text filter I'll go to equals and then the filter rows dialog box I'm going to put equals dot CSV and then click ok right that's now filtered down to just include the CSV files next we want to combine all of these files together for this I'm going to click the combine files button and this is the button that performs all the magic to combine these files together so I'll click that the combine files dialog box has opened up I'm just going to click OK on that and then power query will create a number of additional queries and steps to help us combine these files together there we go on the left we have a whole load of more queries and on the right we have more steps let's just scroll down in our data to see what's happened you can see here that a row 50 that's where our February data ends and the row 51 is where our January data begins so this has combined both of those CSV files together into a single query now if you're working along with this method you don't need to change any of these helper queries so the parameter one the sample file and the transform file you don't need to touch those at all actually you shouldn't touch them at all but we have our transform sample file this is a query that we can change because what happens is that for each file in our folder any Transformations applied to our first file the sample file is then applied to every other file in that folder and then combine together in the input folder your query might be called something different mine is just called import folder because that is the name of the file which all that data was from so ideally if there's any transformations to make we want to make them in the transform sample file first and then any other queries that can't go in there we'll put into that main query so the input folder in this scenario an import folder power query also created a number of additional steps let's just look through these steps to see what's happening so we have the source then we change the text to lowercase we then filtered to only include dot CSV files power query then ignored any hidden files after that it invoked the transform file function now this function is based on whatever changes we make to the transform sample file so any changes we make to the transform sample file are automatically added to this function which is then applied to every file in the folder it then removes all the other columns and drills down into those tables and then finally it's applied a changed type step so that means if we want to get back any of the columns that were previously removed we can come back to the remove other column step I can click on that gear icon and for example I could add the name column back in and then if I were to come to the end you can see that that name column is in my final query in this scenario we don't need that so let me just remove that column right let's head across into the transform sample file and make some additional transformations you can see that the applied steps have source and promoted headers we're happy with those I'm not particularly happy with the data types so for a date now because this is a CSV file if this CSV file comes from a region that uses a date format that is not the same as yours then it might give you a strange date if you converted this into a date format so if you're working in the US for example if you were to convert this to a date format it wouldn't give you valid dates instead because it's a CSV file we're going to go to using locale then in the data type select date and the Locale so the source file is English United Kingdom so it's using dates based on English United Kingdom and you can see some examples there I'll click ok the other columns so the next three should be text and then my value column should be a currency for example right now let's head to our main query and you can see that this already has data types applied but our value column is now a whole number rather than a currency and this is because any data types are not transferred from the transform sample file it's actually this additional step here of change type that power query added when it originally created our helper queries and helper steps so let's just change these data types now because date was already changed using Locale in the transform sample file that means that it's already in an acceptable date format so I'm just going to select all of those and detect data type the only change I'm going to make is to change value into a currency and replace the current step perfect I'm now going to go to home close and load close and load 2 I'm going to load this query into Excel so I've got a table on the existing worksheet and then I'll click ok perfect if we scroll down again you can see that we have around row 50 152 is when it switches between the January and February data before we add any new data to this folder and click refresh let's have a look at method number two okay for method number two I've got a new blank workbook open again I'm going to go to data get data from file from folder I'll select my import folder again and then click open again I'll click on the transform data option in here I'm going to select my content and name columns and then I'm going to remove columns remove other columns next I'm going to add a custom column so add column custom column and in here we're going to write an M code function I'm going to call this column CSV and the function that we want to use is CSV dot document remember that anything we type in power query is case sensitive well the content column in there and I'm just going to click ok actually I've forgotten to filter out my CSV files so let's go back and add that step so in the extension column to go to transform format lowercase I'll insert that step and then we want to filter the text filter so that it only equals dot CSV files dot CSV okay now we come back to my added custom step I'm not going to click on the combined files option instead I'm going to click on the expand option I don't need to use the original name as prefix and then I'll click ok right if we scroll down we should be able to see that here we go row 52 we changed from February data into the January data so it has combined these two files together but instead of having multiple helper queries and a main query we just have this one query called import folder which means any Transformations we make we're making to the entire data set I'll start by selecting the content and name columns and then I will remove those columns then under the transform ribbon I can use the first row as headers down here where our new file starts we still have our headers included in that data set so therefore we need to remove that so I've right clicked on that cell gone to text filters and I'll select does not equals so now anything with the word date in that date column has been removed right let's apply our data types again it's a CSV file using a date so I'm going to go to using locale it's a date data type and the file is based on English United Kingdom dates click ok let's change those columns to text and finally my value can change to a currency Perfect all looks good let's close and load that into Excel so home close and load close and load two I'll select a table inside the existing worksheet on cell A1 and then click ok perfect if we scroll down once again we can see that we do have January and February's data both included in this table okay now let's add more data into our folder and see what happens when we refresh okay so I'm going to select my March files and then them both into the import folder then in My Method one query I'll go to data and click refresh all if you watch on the import folder it currently has a hundred rows loaded when I refresh that it now has 150 rows loaded and as we scroll down you can see that that March data has now been included let's have a look at book number two which was method number two again our import folder query says a hundred rows loaded when we refresh all it now has 150 rows loaded so it includes that March data also okay we're about to add another file into our folder but there's an issue we're going to find out what happens if we don't have file uniformity because in the file that we're about to add we've changed the name of a column and this is to illustrate the problem of what happens if our files aren't all structured in the same way okay so let's head back over to Excel and see what's going to happen so I'll open up our April 2019 CSV file and you can see here that in our previous query we have a column which is called customer but in our April CSV file that same column is called sold two so let me close that CSV file and let's add these files into our folder so I'll drag those into the input folder and then with method one I'll click refresh all and you can see there's an error because the column customer of the table wasn't found let's do the same with method two so again in here go to data refresh all and again that same column wasn't found so this just illustrates how important it is when we use these techniques that our files do have a consistent layout okay now let's move on to look at Excel workbooks because there are a few different challenges that come with Excel workbooks the method to import and combine Excel workbooks is very similar to the method to import and combine CSV files however Excel workbooks are much more complex than a CSV file for example a workbook might contain multiple worksheets it might contain multiple tables and then on any one of those worksheets that first row might not be the header row so we're just going to move on now and look at method one and Method two using Excel workbooks as the source and we'll also see a little Quirk of how we can work with Excel worksheets when they have different names okay so let's head on to Method number one so here I've created the new Excel workbook I've already connected to our folder I've already filtered to only include.xlsx workbooks the other thing I've done is because we know that there's an error inside our April file is that I've removed that from our import folder so we only have three xlsx workbooks so now as before we can click the combine files option so here you can see that we have our workbook and it has a worksheet called February now this gives us a problem because when we select the file called February that means that power query will look for a worksheet called February inside every workbook but the January workbook has a worksheet called January and the March workbook has a worksheet called March so normally for this method to work we need to make sure that our worksheets all have the same name however if our workbook only contains a single worksheet then there is a method that we can use so that we remove the search for the name and that's what we're going to look at in a few moments time so power query has now created all of our helper queries and also added the additional steps into our import folder query so these are the same as we saw before now if we scroll down you'll notice that the end of the February file we see a line that's called error if we click on that we see an error occurred in the transform file the key didn't match any rows in the table you can see that it's referencing the item February and as I said this is because each of our worksheets has a different name so let's fix that we'll come into the transform sample file we'll look at our source step and then the navigation step where the kind is a sheet if you go back to the previous step it's filtered items called January where the kind is a sheet so we only want sheets that's true so let's remove this section there that says item equals February click away on that and then let's make the other transformations so I want to remove rows remove top rows I'll insert a step remove three rows and then we can then promote headers and I'll select all the columns and go to transform detect data type because our source is an Excel workbook and it's likely those dates are valid dates because Excel handles dates using a serial number basis so therefore we shouldn't need to use the using local method I'm happy with text text and text for the next three columns and then values should be changed to currency thank you we'll replace that current step you'll notice that we now have an error on our input folder so let's click on that you can see the column sales data of the table wasn't found if we look in our formula bar at the top you can see it's looking for sales data column one column three column four column five this is because we removed rows at the top of our query so let me remove that change type step I'll select all those columns again and then we can apply it correctly I'll replace that current step and then we can close and load this into Excel so home close and load closing load two we want the table on the existing worksheet in cell A1 and then I'll click ok right you can see there we have 150 rows loaded which means that we have January February and March data all loaded together inside this single table okay it's now time to move on and look at method number two using the Excel workbook as the source okay here we are back in power query I've connected to the same folder and I've filtered down to just include the xlsx files I'll select the content and name columns right click and then remove other columns we'll then go to add column and create a custom column and call this column Excel the function we want to use is Excel dot workbook just be careful about the case sensitivity there the E and the W should both be capital and open a bracket double click on the content column close that bracket and then click ok we can then expand that table as before I don't want to use the original column name as prefix and then I'll click ok now rather than drilling into the data what we've done is drilled into the workbook in that workbook we have this article data each of those is a data object and in this scenario it is a work sheet so if we wanted to we could filter out specific items here in the item column if we needed a more advanced approach I'm just going to right click on my data column and go to remove other columns and then I can expand the data column click OK on that and now we have all those files stacked on top of each other but we do have those header sections included in there so we just need to undertake a few more transformations first of all from the home ribbon I want to remove rows remove top rows and I'll select to remove the top three rows then from the home ribbon I can use the first row as headers so that's given us a header for our column but we just need to remove these other values that aren't part of our data so on the null value I'll right click on that filters does not equal and then the same with the value I select those columns I'm going to transform and then detect data type the only change you want to make to that is to change it to a currency place that current step and then finally we can go to home close and load close and load two a table on the existing worksheet and I'll click ok fantastic we've now loaded that data into Excel as well okay that brings us to the end of our video where we showed two methods of how we can combine Excel and CSV files together from a folder this is a fantastic method that can save so much time because we can simply drop files into that folder refresh and power query brings them all through however we saw the issues that can happen if our files are not uniform if those column names are not the same it can cause us an error equally ideally we want our Excel worksheets to all have the same name then they get picked up inside their M code and can easily be combined so therefore again uniformity is key to success with these methods so if you enjoyed this video don't forget to subscribe thanks for watching and I'll catch you next time
Info
Channel: Excel Off The Grid
Views: 13,461
Rating: undefined out of 5
Keywords:
Id: uPPn2ejoZ1o
Channel Id: undefined
Length: 24min 42sec (1482 seconds)
Published: Thu Nov 10 2022
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.