How to Merge Excel Files with Different Headers in Power Query | List.Accumulate

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
Today, we'll cover a{} more complex data consolidation scenario in Excel, because let's face it, in real life we usually run into problems that we don't expect, and they aren't covered in a standard basic training, so here's a scenario. You want to combine and append the data from different files that are saved in a folder, but the catch is this, you need a mapping table for the headers, the data in each file can have a different order, and they can also have different texts, so they're not consistent. Luckily, you have a mapping table which you'll use, and you're going to use Power Query to get it all automated for you. (upbeat music) Okay, so the files that we want to append are sitting inside the info folder on my C drive, and I want to set this up in a dynamic way, so that whenever I add a new file to this, and that file ends with underscore HR, and it's an Excel file, I want that data to be automatically appended, but here's the catch, let me show you the content of the files here. So this is from the company Blend, and they send their information in this way, where the first column is name, then salary and then department, now, this is an Excel table, and the table is called TSalary, okay, so name, salary, department. Now, the problem here is that the other companies don't follow this order, so let's activate the preview pane here, we have a table here, now these tables all have the same names, but the problem is that Kryptis sends the information with headers in Italian, Misty Wash has it in German, and they also have a different column order, what they have all in common, is that the tables have the same name. Now, we've told these companies many times to correct their names before they send it to us, because it's going to give us a problem when we're appending the data, unfortunately, they don't listen, so our solution is to create a mapping table for all these headers, and we've added the mapping table to our final file, where we're going to have our report. This is the mapping table, it has the list of before names and the after names, after names are the correct names, so we want to have name, department and salary in our final report, and this makes it also easier for us to maintain this, so if anyone sends us information that's missing, because they used another header name, all I have to do is add this information to my mapping table, and then I just need to refresh my final report, and everything is there, but how do we do a mapping of the headers? So this is not mapping of the content but mapping of headers. First thing I'm going to do, is create a connection to the mapping table, this table is called Mapping, let's go to data and send this to Power Query from table, range, and leave the name as is, all I want to do is create a connection to it, so close and load to, and only create a connection. Next thing I want to do, is import the data from the folder, so let's go to get data from file, from folder, the folder is called info, click on open, here I can see a preview of the files included, that's the right folder, let's go ahead and transform the data. I'm going to leave the query name as is, one thing I want to do is to add a filter to this, go with text filters, click on contains, and let's go with underscore HR dot xls, now, this makes sure that only the files that are Excel, and they have an underscore HR in the title, will be included, if there's a word file or a PDF file, those will not be included in the final query, now, let's go ahead and click on these double down arrows here to open up the content, Power Query's going to go and do its thing, it's going to take a look at the sample file here, so notice it picked the first file as the sample file, and it's giving me the content of this file, which only has one sheet and one table. Now, the table that all of these files have in common is called TSalary, so I'm going to go with that and click on okay. Now, Power Query is going to pick the sample file and apply any transformations it needs to that sample file, then use that to apply the same transformations to all the other files, and then it will automatically append the data in our final query here, so it did all of this on its own, but there is a problem, right? Because I have missing data because not all the files have the same headers, so name, salary, department, is actually the correct header that I want, but only one of the files has all of this information, the second file, remember that was from Kryptis, this was in Italian, so I'm missing the information because the headers don't match, then another file is Misty Wash, that had the correct header for name, but salary and department information is missing. I need to map the headers before I append them, which means I need to do the mapping on the transform sample file query and not at the end here, because the moment a file is loaded, it should already map the headers and then it should append the information. Now, just so that I can visually see that whatever function I use is actually working, I want my sample file to be a file that has different headers, so let's go ahead and change our sample file. Remember when we were in the navigation step, Power Query suggested the first file to be our sample file, if I go back to the previous step, we can see the first file is Blend, so in the navigation step it's taking position zero, which is the first file, because remember, Power Query is zero based, so if we want the second file as our sample file, we need to change the index to one, because this is zero, one, so let's go ahead and do that here. Now, the moment I press Enter, Power Query's going to add another step which we don't need, we need this to end in navigation, our sample file is now Kryptis, we should be able to see it in the sample query, and we do, we now have the headers in Italian. Now, in info we're running into a problem, the column name of the table wasn't found, so remember, originally the column names were in English, so I can't apply the change type step here, but if I remove this, I should see the new column names up here and I do, and I'm missing a lot of information because Blend is not there anymore, the only information that I'm going to have, is going to be from Kryptis, everything else is missing, so your sample file is actually the driver of your final version, because it's going to take the headings of your sample file. Now, we need to apply our header transformation at this stage here, now, the way I'm going to transform the headers in this example, is to use List.Accumulate, which is one of my favorite functions in Power Query, it allows us to loop through this list, see if there's a match between the existing header and any one in our before list, and then replace it with the after list, so let's go ahead and do that, in our transform sample file, let's add a new step. Now, this is the name of our previous step, our previous step was navigation, and this is showing us the table that we selected in the navigation step. Now, List.Accumulate is something that we're going to use inside a function that's going to help us do the final transformation, that final function is going to be table.transformcolumnnames, that's the main function that we need. This function needs two arguments, one is the table and the second is namegenerator as function, so it expects to have a function here, that it can apply to each single column, this is where we're going to use List.Accumulate, so I'm going to go with each, because I want it for each single column, now, let's go with Shift+Enter, and let's expand this, so that we can see better, now, don't worry about the error now because we're not done with this function, so we're just going to continue, and go with List.Accumulate, actually, it's right here, press Tab, now, open bracket and see what we need, first thing, is a list as list, now, our list is going to be our mapping, but our mapping currently is a table, so we need to transform it into a list, we can transform it into a list of records or a list of lists, as long as it's a list, so I'm just going to go with Table.ToRecords, and transform it into a list of records, so that's the mapping table, and that was the first argument. Second, is the seed as any, so seed is the first item that we're going to put through this function that we're going to be creating, the function is called the accumulator function, and this can be whatever you want to make, it's very similar to recursive lambda, if you're curious what a recursive lambda is, (air whooshing) I have a video on it, and I've added it to the description of this video. So in this case, our seed is our header, so it's each header, and for this we can use an underscore, now, comes our accumulator function, so this function needs two things, now, let's press Shift+Enter to go to the next line, the two things it needs are state and current, now, you don't have to call them state and current, you can call them anything else that you want, but I prefer to use this syntax, because it's easier to distinguish between the two different states. Now, here we need the go-to operator because this is a function, our function is going to use Text.Replace, so let's go with Text.Replace, we want to replace our current text, which is the header, this is something we have to keep in mind, is that because we're inside the accumulator function, we can't use the syntax we have here, so I can't put an underscore here, instead, I have to use this vocabulary, state and current, so my seed is going to be my first state, but remember, I can't just leave it like this, because my headers are not in upper case, and need to make sure everything is in upper case, so I have to put my state in the text.upper function. First, I transform it into upper case, and then, I use that in the Text.Replace function. In Text.Replace, I need the old as text and the new as text, so the old text I'm comparing this to, is the first item in my mapping table, in the before record, so remember, I transformed my mapping table into a list of records, the current is going to be the first item in my list of records, but I need to specify which one, before or after, well, I want my old text to be before, and my new text to be current after. {}, we have to be careful that we have to have the headings typed in correctly, now, I'm pretty sure I had them in upper case before, so let me just fix that before I press Enter, now, let's make sure we have closed our brackets properly, and press Enter, and we get these transformed correctly. Now, List.Accumulate is not an easy function to get the hang of, that's why I've dedicated a whole section to understanding this function in my Power Query course, but basically what's happening here, is that in the first run of this function, it's going to take our previous header, which was departamento, that's going to be your state, it's going to compare it to the first item in before, so if I go here, it's going to compare it to this, and then it's going to replace this with this, but there is no replacement to be done, because full name is not included in departamento, and then the new state is going to be the result of my transformation, in this case, the new state is the same as the old state, it remains departamento, and then it goes on to the next line here, and it tries to do that replacement, so the great thing with List.Accumulate, is that you can do multiple name replacements in one go, because you change this current state of your outcome, as you move through this list. I understand that this is very geeky and a bit difficult to understand, so you can skip through my explanations and just go with the final outcome here, or if you'd like to understand it better, consider checking out the full course, but ultimately this is all we need to get this to work, well, let's make sure it really does work, so here we have department, name and salary, we have the values of Blend, and if we scroll down, well, we have Kryptis, that was the original file that we were working with, Misty Wash is there, it looks like everything has been mapped correctly. Now, if anything is missing from our mapping table, the data will be missing here, all we have to do is update our mapping table and everything will work. So let's go and load this to our workbook, existing worksheet, I'll just put it right here and okay, and that's our consolidated data file, I think I was just too excited to do the final corrections to this, I forgot to just grab the company name, and also update the data types, but before I do that, let's test this whole concept, I'm going to get a new file from another company, and one of the headers is not going to match, so the file that I have is on my desktop, and I'm just going to drag and drop it here into the folder, let's take a look at the preview, department is correct, for name I have first comma last name, and then salary is correct as well, so this one is missing from my mapping table, well, what happens if I just go ahead right now and refresh this? Let's see what we get. The information should be appended, it's right here, and the name is missing, so now, let's go ahead and add this to our table, so it was first comma last name, but notice everything is in upper case here, so we don't have to worry about the different cases we might have, so first comma last name, and that should map to name, now when I refresh this, we should see our names pop up here, and we do. Okay, so now let's go ahead and make the final corrections, just grab the company name by going to transform, extract, text before delimiter underscore HR, let's pick that as the delimiter and okay, change this to company and also update the data types, Control+A to select everything, under transform, detect data type, all our steps are added and recorded, and we're just going to go and load this back to our workbook. Now, in case you're dealing with a lot of data, one thing you can do is to buffer your mapping table, this is going to give you a performance boost. So that's one way of solving this, but there are always different ways to solve a problem, the more you learn, the more choices you have, and then you can pick the better one based on the situation. If you'd like to improve your Power Query skills, so you have more options to approach a problem, check out my complete Power Query course on xelplus.com. As usual, thank you for watching, don't forget the thumbs up before you leave, and if you're new here, consider subscribing, and I'll see you in the next video. (upbeat music)
Info
Channel: Leila Gharani
Views: 59,021
Rating: 4.9417367 out of 5
Keywords: XelplusVis, Leila Gharani, Advanced Excel tricks, Excel online course, Excel tips and tricks, Excel for analysts, Microsoft Excel tutorials, Microsoft Excel, Excel 2016, Excel 2013, Excel 2019, XelPlus, Microsoft 365, Excel 365, excel power query, power query, microsoft, mlanguage, power query advanced tricks
Id: wKglApDFMog
Channel Id: undefined
Length: 17min 4sec (1024 seconds)
Published: Thu Feb 18 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.