How to COMBINE DATA with MERGE and APPEND in Power BI

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
think of these cubes as data tables they have the same shape however different colors and we can place them right on top of each other and that's basically abandoned now over here we also have cylinders different shape but we can match it with the cubes on the basis of the color that's emerging now let's see how it's done in power bi when you combine data in power bi you basically have two options you can either ban data or merge data now let's first explain what the difference is between the two and then go over all of the things that you might run into while using them just imagine the following situation we have salary data for different employees and that we break down this data into separate tabs inside of Excel and one tab for each department so here we have one for finance one for marketing and one for HR and we're gonna connect to these tabs inside of power bi now here we are in the power query editor and we have the HR table the finance table and the marketing table and I would like to combine all three of them into one table now you see that they have the same columns and so here we have HR you in a finance uef marketing and if you want to combine data like this that's a similar structure then you do that using appending okay so you can simply go to the Home tab then here almost all the way on the right we have merge cream ISM band queries now we are gonna use a bank query so that basically means you take one dataset put other dataset underneath it and then the third dataset let me that one okay so you're there's a drop down so we can say append queries or append queries as new now which one to choose append queries is no jet this basically means you leave the other three intact as they are and you're gonna create a new one that combines all three of them if you would just click on append queries then we're going to append the other datasets to the one that's currently selected here you have the choice to choose either two tables or three or more take now here I basically always switch to three or more tables even if I just append two tables but because the interface is just a little bit nicer and here on the left-hand side you see all of the tables that you can choose from you're on the right-hand side you see the ones that you're gonna bend so if we want to Pat here finance and I just select finance on the left hand side and add it to the right-hand side and then the same for marketing and John and once all three of them are here on the right-hand side and you can also change the order simply by using the arrow keys okay so once you have all three of them there simply click on OK and now a new query pops up here and one which we can rename of course so let's call this one later if you append queries then also always double check if it actually worked by counting the number of rows and to see if it corresponds to before and what you're expecting now here it's kind of easy and so we have nine rows and so it works however if you have working with larger data sets you have to go here to transform count the rows and then it will tell you how many rows you have then you have to double check if that corresponds to what it should be now then you can delete that step and then continue all right so this is what abandoned us so let's also have a look then and what merging does now here we have another data set where we are showing the bonus amounts for the different employees for q1 2020 now let's have a look inside of RBI now here we have the same table inside of power bi and what we got to do is we wanted to take that data and match it with the other data that we just appended okay so we have over here the bonus table and then here we have the employee and you might notice that they are having a different structure okay so here I have the part in position name salary and here I have name period bonus okay so what we want to do here is not put these data stands underneath each other but we want to match the day let's do this I'm gonna go over here to the Home tab then above append queries we have merge queries and also here we can either choose merge queries of merge squeezes no so let's go for merge Queens as new as we want to leave the initial queries as they are so let's click on it and here we have at the top the first table so you can either choose the bonus table or we choose the employee table okay and over here let's then choose the other one so that's bonus in my case and then there is also the join times which plays wrong when there's not a perfect match between the two tables that we are emerging okay let's get back to this later because here we actually do have a perfect match and you see already here at the bottom that the selection matches nine of nine rows from the first table and now at this point you're not done yet you see here on the left side we have merged one and here we have everything from the first dataset that was at the top so that is our employee table and then we have this one extra column the bonus column which contains basically nested tables and when you click on the empty space right next to table then here at the button it gives you the matched rows from the other table the bonus table in our case okay and then we still have to say which ones do we actually want to bring over which ones do we want to join okay so that we can do by going here to expand button okay so here in the header of the bonus column let's click on it now here you can then say which fields you want to bring over so all of them and here also if you want to use a prefix or not let's leave everything standard let's click on OK and you see that it brings over all of the columns from the other data set and also has this prefix which probably you don't want to have in this case so let's go back click on the gear icon and then get rid of this bonus prefix and you see that we have also over here the name which we already have in that data sets that doesn't make sense to bring that one over that's okay and there you go we have no your period and the bonus matched from the bonus table okay so what we have over here is now basically totally flat table that contains all of the information from all of the data sets first by appending it and then merging the data from the bonus information so now that you familiar with the two concepts of appending and merging let's go over some of the details all right let's start off with appending now usually when you use a pen name you do not want to load individual tables that you combined but only the combined result itself not to disable the loads to the data model you just go over here to your queries finance in our case disable the Lord you do the same for marketing and the same for HR okay so now these tables will not be loaded okay you can also see that because they are cursive okay and then usually I would take all three of them right click move to a new group and for example call this one input employee table now another question that you might have is if the data sets need to be structured exactly in the same way and this is not the case now let me show you what happens when they are not ok so I'm gonna go over here to my age our input table and let's say that the order of the columns is different okay so maybe here we have to name so we have to begin now that I have changed the order of the columns let's go to the employee table and you see it's still exactly the same we have nine rows four columns now the order of these columns is determined by the first table that you use for appending okay so if you go to the source tab and then here you see the finance table is at the top so it means it takes over here the finance stay with this order as the order for the final table we have over here okay so if I put named here also at the beginning and go back then we have my name here also as the first now what happens if you have columns that are not inside of the other input tables that you are appending well let's have a look so I'm gonna take the name column let's rename this to full name okay then I go back to the employee table and you see that we now have notes because the full name column doesn't show up in the other input tables and effort creates new one fills the rest up with notes and over here we have the name column where we now have knows because in the HR table we now don't have anymore the name come on another topic that might be on your mind is what happens if you have same columns that you have pending but the data types are different okay so let's take the HR table again it's an example go to the salary common I'm gonna change from hole number into text okay now then I switch here to the employee table and you see now we don't have whole numbers anymore but we have undefined on ABC one two three and so here we basically have the whole numbers and then we switched to text for the HR department okay so it doesn't assign the data type and therefore we have to assign a datatype here in the append query another thing that might happen is that some crucial information is actually missing from the data set so here for example we broke it down by different departments and what if there is no department column and then you combine the data well here you see that we have a combined data set but without the department and now you don't know which employee belongs to which department okay now let's go to the HR input sheet okay so you see we only have now named position salary and what you need to do in this situation is that you add another column that returns the department name but you can simply do by going here to add column custom column and then call this one and it's important that when you do it for the other queries that you don't make a typo because then it doesn't match the different columns okay and here we can then use quotation marks and in between these quotation marks we're gonna put a charm and then we're gonna do the same for finance and marketing so now that we have added the different Department columns and we go back to their uncle Year table and now here in the combined data set we have the department column and we can see which employee belongs to which department and what to do if you have data sets that have a similar structure and you want to burn them however the data sets are split over multiple files that are being dropped in a folder somewhere for example maybe have the new actual data coming in for every month or in our case we're gonna bone us information that is being dropped into a folder every quarter and we want to append that with the previous quarter's okay so at the future data sets they are not there yet so we cannot just connect to the next quarter data set now what you do in these situations is that you can connect to a folder and then event all of these files that are inside of that folder now let's have a look this is done you can go here to get data and then choose the folder connector and then just browse to the folder now here we have our data sets which are stored in the folder called bonus just gonna copy the file path from here ctrl-c ctrl-v now here it's very tempting to click on the combine buttons or combined and transform data and actually it would work in this case because the data sets are exactly the same and we only have the files in this folder then we want to combine however if there are different files in the same folder thing you might have a problem and therefore I always go not here to combine a transform data however to transform data and this gives you a view of all the different files that are inside of this folder in which you can combine so here you see we have four files one for each quarter in the 20 and to combine them you can simply go here to the content column and then there's a combined files button let's click on it so here you have to choose the sample file which can be any file in the folder but by default is been first one that it finds and it takes that structure of this first file and it looks for the same structure in the other ones okay so it's not as flexible it's what we have seen before when we use the normal append feature okay so here I'm connecting to that excel file then I'm gonna connect here either to the table or to the sheet and it's important that that name of the table or the name of the sheet is also indeed other ones and so here I'm gonna connect let's say to scene one then she won't also needs to be in the other excel files now the preview looks fine then also here you can say skip files with arrows now if you have this one jacked then even if it finds and say a data set that has totally different structure may might fail then it still continues for the other fans that are in the same folder okay now if you don't have this one checked it just stops okay so usually I would actually have this one select it now let's click on OK just like magic we have mounted combined data set you see we have Q 1 Q 2 Q 3 Q 4 no its first column over here with the source name might not show up for you in case if it doesn't just go here to remove other column step click on the gear icon and then you can bring it back okay now let me go to the last step again now you see here on the left hand side we have one new folder now this folder contains basically all the stuff that is necessary to do that pending all right so you have the parameter sample file a transform file you can just ignore it now if you go back to the query now also here on the right hand side we have new steps that were added that basically not power bi to this end result all right now also here always double check if the Panda actually worked now for us it's kind of easy to see now but that we have thirty-six rows and so it's just small data set now in reality you need would need to go to the transform and then count the rows and see if this meets the expectations okay now then you can delete that step and continue now I'm gonna go to one of my Excel files and let's say somebody would not have called this Siwon but she - okay I'm gonna say that go back to power bi refresh the preview and you see we now have Q 1 data Q 3 Q 4 Q 2 data is missing however 2 Q 3 Q 4 data is still there and that is because I said skip files with powers if that would not have been the case and it would have stopped with Q 1 okay so that's why I make sure that I select that checkbox now what would you do if there are also other files in the same folder well then you need to go back to the source tab and in our example I had it over here who ran the file into the folder okay so yeah let's go random fun and you also see it here inside of power bi now then we can just simply filter it out before we actually combine the files alright and that's also the reason why I always go to edit first before I combine so I go here to the name column and then I can put a tax filter where I say it contains and that's inserted stuff because we know what we're doing here so keep rows where the name contains and then let's say that we have only the files that contain a name bonus click on OK and no story continues just like before all right so you just filter out the files that you don't need and this can also happen when you have maybe word fats PowerPoint presentations or CSV files in the same folder then just put in a filter here on the extension color so also here go to the drop-down tax filter equals and now it would only combine excel files that contain bonus in their name but this way you make your solution a little bit more future proof another thing that you might wonder about is what happens if you have subfolders well basically it doesn't matter it picks up everything in all of the subfolders so if you want to have a nice organize let's say by year 2018 19 20 then it goes through all of the subfolders and all the files and subfolders you will also see now let's also concentrate now on merging data now before we had a perfect match between our employee data set and the bonus data set however in reality it might be that some employees only showing up in the employee table and not in the bonus table and the other way around and then it matters of which type of join you choose so let's have a closer look at the different joint types now here we have the same employee table as before however it took one person out from the finance department and here in the bonus table I added two new people and let's see how that influences the merge so I'm gonna go here to merge queries as new we're gonna have our employee table we're gonna match this one on the basis of the name we have the bonus table okay now before when we didn't change anything for the joint kind we have nine out of nine rows that match now in this case we only have eight of nine rolls that one match now let's go over the difference joint guides okay so starting with the left order which is the default okay but basically means you keep everything from the left table and match whatever you can match from the right table now here there's no left and right however the one that is on the top that's the left table and one that's at the bottom that's the right table okay so let's see what this returns I'm gonna click again here expand and you see everything was matched however we have known for this person over here marketing analyst Rico eunsuh which is only showing up inside of that employee table but not inside of the bonus data okay and that's why we have over here no now let's change left altitude right now you see here at the bottom the selection match is now eight of ten rows from the second time so it takes the second table and tries to match everything that it can match from the employee table so this means that two employees they are showing up in the bonus table but not in the employee table so maybe our employee table is outdated okay now let's click on OK to see what happens ok so here you see we have notes for Christina Lafleur and we have over here notes for Tony Smith ok so these two employees they don't show up over here in the employees let's now switch them to an inner join now that we switch to an inner join you can see here that the selection matches 8 of 9 rows from the first table and 8 of 10 rows from the second table ok now let's click on OK and then expand it again you see we have no notes anywhere we only have the rows where there was a match ok so that's why we only have eight rows in this case then let's switch to an outer join now you see that we have 11 rows so more than the first day but more than the second table and alter joint keeps all of the information that there is so that's why we have here nose for Christina a flat nose for Tony Smith and then also notes for your cleanser and because they show up in one data set but not the other ok so with the full outer join you keep all of the information that is in both of the data sets now let's then also have a look at the last two which are left ante and right and so when you do left ante click on OK you see only one that is not appearing in the second table so that is Beyonce okay now let's do that then also for the right ante okay and here you see Christina Leffler Tony Smith don'ts two employees that I'm not showing up in the other table now these are all of the different joint types that you can find in power behind now you might also have noticed that there is the option to do a fuzzy match across images something that you would use if there is no exact match match between the two tables on the basis of the names okay so the names might have been spelled a little bit differently but you still want to perform the match that's where fuzzy match comes in if you want to know more just click over here on the on the link another thing that you might run into is that you need to have the unique identifier based on multiple columns okay so in our example that could be the first name combined with the last name so here I split the first name and the last name column okay so what you do that and see you just click on the first name hold the ctrl key click on the last name so that they get concatenated and then you match it also over here on the first name and the last name in that order now let's go for an inner join and click OK then expand the bonus column and you see we have no first name last name period and the bonus column so everything nicely got matched not on the basis of this one column but the combination of two columns combining data through merging and depending these key tools in power behind that you really need to know give any specific questions about this topic just let us know in the comment section below and if you like to see more content on power bi consider subscribing to our Channel or give it a thumbs up thank you and see you in the next video
Info
Channel: How to Power BI
Views: 115,252
Rating: undefined out of 5
Keywords: append merge power bi, power bi append vs merge, append tables in power bi, merge power bi, power bi merge queries, append queries in power bi, power bi append queries, append in power bi, merge in power bi, power bi, how to merge in power bi, how to append in power bi, append and merge in power bi, append vs merge power bi, append and merge power bi, power bi append merge, power bi append, power bi merge, append power bi, append vs merge, append vs merge query power bi
Id: 1cPtk-8iK0k
Channel Id: undefined
Length: 23min 16sec (1396 seconds)
Published: Tue Jun 30 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.