You're Combining Files from a Folder with Power Query ALL WRONG!

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
we can automate the Gathering and consolidating of files from a folder easily with power query but most people do it wrong in this video I'll show you how to get files from a folder so any new files are automatically included on refresh and I'll show you the errors most people make and what to do instead by the way this applies to both Excel and power bi in the folder here you can see I have seven CSV files by the way you can also get text files and Excel files from a folder which we'll look at later now I recommend that the folder only contains the files that you want to consolidate that said you can pick and choose the files you want to import it's just safer and cleaner to keep the folder dedicated to your Source data files trust me so in this example each file contains three columns of data date region and revenue ideally they should have the same number of columns with the same column names although they don't need to be in the same order now typically data exported from another system is in a consistent layout so you shouldn't have any problems let's close this down and we'll go and get the data from the files with power query you'll find it on the data tab of the ribbon get data from file from folder and if you're using power bi you'll find it on the Home tab of the ribbon get data more and then folder from there it's the same so let's go back to excel here I can browse to the folder containing my files I've already got it selected so I'll click open at this dialog box you see a list of the files in the folder here you can choose to load them right away or combine and transform combine and load and combine and load two I always choose transform data this is going to allow me to filter out any files I don't need keeping my file as small as possible so here we have a list of the files and the metadata I can filter out any files I don't need at this point now keep in mind if you save the file containing the query in the same folder as the data it's going to get double counted which is why it's a good idea to Only Store the data files in this folder and save the file containing the query outside of it now I need all of these files so I'm ready to combine them and we do that by clicking the double down arrow on the content column this launches the combined files dialog box where you can see a preview of the data now by default it chooses the first file as the sample file but you can choose a different one from the list it's detected the file origin but you can also change it here and the delimiter it's going to use the first 200 rows to detect the data types you can choose the entire data set that's going to make things quite slow potentially or you can set it to not detect data types I'll leave it based on the first 200 rows once you're happy with these settings click ok power query goes and gets the data from the files you've selected and it consolidates it into one table now on the left is a list of the queries that were automatically generated by power query and there are two items in this list that are important and you can forget about the other ones the first one is transform sample file this is the query that power query uses as a template for consolidating all of the files into the final query it only contains one file of data the file you selected at the previous dialog box and then the final query that consolidates all the files into one table if you want to make some transformations to the data before loading it you can make those changes in the sample file or the final query except and this is where people go wrong if the Transformations need to be done before all the files are combined now in the case of this data it's already clean and ready to append so any Transformations can be done in either query let's take a look at some data that's not so clean here I have some sales data files and if I open one of them you can see that the column headings are split over two rows which is a No-No because tabular data structures which we should all strive to use and power query returns for us only have a single row of column labels now we can fix this data layout in power query before combining the files let's take a look so again on the data tab get data from file from folder and my files are in this folder called Data so I'll click open and here I'm going to go straight ahead and combine and transform because I want all of the files and that's just going to save us a bit of time now because I'm combining Excel files I get a slightly different dialog box here because Excel can have multiple sheets whereas CSV files and text files only ever have one sheet I've got one sheet in this file if I had multiple sheets they'd be listed here as would any tables that I have on those sheets so I've got a sheet called sales now the sheets in all of the files you're getting also need to be called sales likewise if you were getting data from tables the table names in each file need to be the same so I've got a sheet called sales and there's my data let's click ok and you can see the first row has been appointed as the header row but then our second row of headers is in the first row of the data and if I scroll down to row 2155 you can see the headers appear again at the start of the second file and they'll repeat like that throughout the data set for each file in the folder and it's at this point that most people go wrong because they try to fix the layout in the final query when they should be fixing the data layout before combining the files and that's done in the sample query this would also be the case if you needed to unpivot data for example so let's go and do that in the sample file query I'm going to remove the step that promotes the headers now both of my header rows are in the rows of the data I can transpose it now I've got my headers in two columns let's right click and merge those columns into one and we'll separate them with a space the name of the column doesn't matter I'll click ok there's my new column headers let's transpose the data back now I can promote them use first row as headers there's my new header row and we've recorded all the steps on the right hand side so all of these steps are going to be applied to each file in the folder before they're combined now if we go back to our final query we get an error and that's because this last step here change type is looking for the old column names which we've fixed so all we need to do is delete that that's fixed it now we can select all the columns again holding down shift select the last one and then on the transform tab detect data type and we're back now if I scroll down to row 2155 you can see the extra header row is gone and that'll be the case throughout the data set now let's do a little bit more tidying up I don't need this Source name column it's just the name of the file so press delete let's give the query a better name this is order data so I'm going to call it orders it's important that the name you give the query is useful because that name is going to flow through to powerpivot if you load it into the data model it's going to become the table name and the sheet name now I get asked about this a lot so to reiterate if you need to make Transformations before the files are combined you should make them in the sample file query if the Transformations can be made after the files are combined then you can use either the sample query or the final query all right I'm ready to close and load now in Excel it's a little different to power bi let's take a look in power bi I can only load the data to the data model so on the Home tab of the ribbon I have close and apply and that's all I can do in Excel I have some more options on the Home tab I can close and load and under close and load two I can choose from a few different destinations I can load it to a table in the workbook a pivot table report or a pivot chart or if I'm loading it to the data model or I'm using this query as an intermediate step I might only want to create a connection I can also put it on an existing worksheet or a new worksheet let's load it into a table on a new worksheet and there's my data notice the sheet name is orders and the table name is table underscore orders and that's my query name flowing through to my data set now one of the most awesome things about power query is the ability to get updates with the click of a single button let's say we just received the next period sales data let me paste it in so now we've got 2020s data let's go back to the query and I'll just check we've only got 2019 2018 and 2017 data in order to get the new data there's a few ways I can refresh this query I can right click it and refresh on the data tab of the ribbon I can refresh or refresh all queries or if I've only got a connection to this query I can open the queries in connection pane via the data Tab and on the right hand side I can click the little refresh icon here so let's go ahead and do that you can see it's already refreshed and if I look at my drop down now I've got 2020 data in power bi you'll find the refresh button on the Home tab of the ribbon and refresh alternatively in the data View you can select the table and via the Ellipsis you've got refresh data I hope you found this tutorial useful and if you like this video please give it the Thumbs Up And subscribe to our channel for more and why not share it with your friends who might also find it useful thanks for watching foreign [Music]
Info
Channel: MyOnlineTrainingHub
Views: 24,647
Rating: undefined out of 5
Keywords:
Id: Nbhd0B5ldJE
Channel Id: undefined
Length: 10min 18sec (618 seconds)
Published: Tue May 23 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.