Learn Power Query & Automate Boring Data Tasks in 15 Minutes!

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
so you're thinking Excel has all of these features power bi has all of these features what am I supposed to focus on here is my answer focus on power query this single skill alone can transform the way you look at your data so in this video let me demonstrate how power query can truly change the way you work with your data sets using five real world examples plus a bonus example let's go our query is a data pre-processing engine we can use power query to take any raw data and make it ready for the analysis so here I have put some sample data and we will use this to understand what power query could do for us even though this data looks clean it still requires some cleanup activities to be done on it so these are the cleanup activities we want to check for blanks we want to check for duplicate records we would like to then create a student ID from the given name so for example here I have got Haley and Cliff and student ID for Haley would be Haley a we also want to figure out what the term start date is and what the fee status is here at awesome Academy we are running two terms the first term starts on February and the second term starts on 1st of April So based on the registration date if the date is below or before 15th of December then they will start their term on February else they will start in April so we would like to calculate that and finally based on the fee payment status we want to know whether somebody has made full payment part payment or no payment the full payment is 500 anything less than that is part and if no payment has been made then they we need to know that as well so those are the cleanup activities let's go and use power query we can think of this raw data file one skill you should focus on as my source data so I'm going to close this and open a brand new Excel file in this file we would like to connect to the source data file bring the data but also make sure that those cleanup activities are done and then produce the finalized data here which we can then use to do some data analysis work so to activate power query you just go to the data ribbon and here in the corner you will find get and transform data options this is available since Excel 2013. so almost 10 years now but depending on which version of excel you are using the buttons may look different or they might be in slightly different parts of the screen so from the data we can get the data from many kinds of systems our data happens to be in an Excel workbook but you can see that for example I can use power query to bring data from a PDF a SharePoint folder a database such as SQL server or a cloud system such as azure I'm gonna go to file and then say from Excel workbook point to the file that has the data in it I have provided a copy of the sample data file in the video description below so go and pick it up from there and click on the import button this will open up a navigator screen and then shows you the data set here my worksheet is called ring of power and it shows me the entire worksheet not just the data this is kind of okay but there is a problem because our data actually begins from column number three row number four so it starts somewhere here but Excel or power query in this case reads the whole thing so it shows me the all of it this is fine we are going to click on the transform data to jump into Power query so that we can clean up this data this will open up the power query editor screen using which we can tell Excel how we want to clean our data or how we want to calculate those extra values as I mentioned earlier power query is a data pre-processing engine so here you can ask power query to do certain tasks and it will record these tasks as steps on your data so anytime you want to rerun them you just trigger the refresh process either from Power query or from Excel and it will rerun all of these steps on your fresh data so here the first thing that we would like to do is make sure that these empty columns are gone and we are only reading the data our data begins from column 3 to column seven anything else is not required so I'm just gonna go to choose columns and unselect everything and then just select these five columns next up we would also like to discard the first three rows so that we are only reading that little table of the student information so here from the home ribbon you can click on the remove rows button and remove the top three rows at this point what our query has done is it has taken out those excess columns and rows and just gives you the data that you want now our Header information is row number one so we would like to take these values and make them headers this is a simple step go to home ribbon again and click on the use first row as headers option if you pay close attention to the steps area here you'll see that each thing that you are doing is recorded as a step within power query this is how if you want to repeat these steps because you have got new data all you have to do is replace the original file with a new file and then refresh and Excel will again go and run all of these steps even though I'm demonstrating part query with Excel whatever we are doing here can also be done the same way within power bi and that is really why I like power query because it is the same software that you can use both within Excel as well as in power bi to solve all our data problems all right let's do the first thing which is looking for the blanks we can see that whenever the course is blank it is showing up as blank space but whenever the registration date is blank it shows up as null this is a common problem in the data situations where sometimes blanks appear as null sometimes they have an empty space there whatever may be the case if there is a blank row like any blank values I just want to delete the whole thing let's start the process with the registration date click on this little icon to open the filters and uncheck the null value so any registration date that is empty we would like to delete that entire record because they are not valid students in our system that should have cleaned up some data but we still have some more blanks here so I'm gonna do this for course as well go to the course and any course which is blank we are going to take away those records as well mind you your original Excel file doesn't change it will still have all the data all the time it's just that power query here is adding a pre-processing step where it is ignoring those records if you want to do something else with the blanks instead of taking them out you can also do that with power query now that we have handled the blank values let's check for duplicates we can use either name or email address to check for duplicates I'm going to use email address because this will give you a better chance of capturing the true duplicates so we can select this column and again you can do many things for example you can handle the duplicates by marking them or you can delete them for the purpose of this pre-processing of data I would like to again delete all the duplicate records so select this column right click on it and then you will find remove duplicate option just click on that and any email address that is duplicated will be deleted now that we have removed both blanks and duplicate values let's go ahead and generate that student ID which is first name and then the very first letter of the last name because the student ID would be an extra column we can go to the add column ribbon and from here you can add a column as you learn power query you will understand how to use the functions within power query to develop such columns on your own but because we are still understanding what power query is we don't know what functions to use so in this case I'm going to use column from examples option and then ask power query to develop the formula based on the logic that I am giving so when you click on that power query we'll add a column here and asks you type some values I'll figure out what logic to use so I'm going to type the value Haley a that's the ID for this person and this one is Army r at this point power query seem to have figured out some but it got here wrong it says Ramon e whereas we want RAM on R so I'm going to type this again [Music] and that seemed to have done the trick it has figured out what to do and filled those values all the way through this is a bit like The Flash Fill feature of excel but the cool thing with power query is it is actually writing the formula for us so that means if and when our data changes or updates our query will rerun the same logic and generate the first name last name first letter or automatically for us so if you read the formula here this is the formula that it is writing it's saying let split name splitter split text by delimiter space name so essentially it is taking the name and it is splitting the name into individual chunks based on the space as the delimiter so it gets the first name and last name and then it is combining the full value of the very first split and then just the first character text Dot start gives you the first character of the second part of the name so that's what it has done this is exactly what we wanted so if I click ok I'll get a custom column that would have my student ID this is good enough for us now let's just name this column as student ID so I'll double click on the custom and then type student ID next up let's figure out what would be the course start date based on the registration date remember the logic is if the registration date is prior to 15th of December then the start date is 1st of February else it will be 1st of April so we'll select the registration date column again from add column we're gonna use the conditional column option and this would be the column name is course start and here we can build a condition like an if then condition so if registration date now I want to say prior to or less than but it says equal to does not equal begins with ends with like that what seems to be happening is power query is thinking the registration date is actually a text value so I'm going to cancel this out and first ask power query to treat the date as an actual date value so here it tells me that for query things this is an abc123 which means it's an alphanumeric value so I'll right click on the registration date change type to a date value now all the dates are converted automatically to a date format here I'm using month day year format but whatever is the Locale of your computer that is how these things will appear now that they are appearing as a date let's go back to the conditional column again and course start if registration date now I can say is before and then pull out the calendar pick 15th as the date then the output would be 1 Feb 2023 so anybody who registered prior to 15th of December goes into the February batch of the classes and then anyone after that we'll go into one April 2023. click ok now those dates are also added and again for query has converted the date format but it says abc123 so I'm gonna right click change type and date this as well so we have calculated the course start if you want to put some other wording here like semester one semester two you you could do that as well our last thing is checking the fee status and putting a remark against the student so if it is full payment made part payment or no payment again we can use the conditional column for such things so if you click on conditional column fee status and then here I can say if we paid now again we have the same problem which is equals does not equal the fee paid column is not being treated as a number so we are not able to do numeric operations so you need to cancel this out right click again and then change this to a whole number and now we are able to work with that this brings us to an interesting point which is our query is a strongly typed system what it means is for every piece of data that power query works with it wants to know what type of data it is is it a text value number date time what not so you want to specify the data types clearly early on in your data pre-processing stages so that you can work faster within power query once that is done we can now go and add a conditional column V status and then fee paid is let's start by checking with is it null so fee paid equals and then here I'm just going to leave that one as blank then output is not paid then we can add one more Clause if we paid equals 500 then full fully paid else they have made a part payment so partly paid let's click OK here part query expects a value instead of blank so I'm going to type null instead of anything when you are typing null values within the boxes you need to type them in small letters and click ok so here we will get a fee payment status for all our students whether they have fully paid part paid or not paid this is perfect let's go and load this data to excel before we do that I would like to make a name change the ring of power sounds a little bit fancy so I'm gonna go and name this as students and that would be the name of the data set that goes to my Excel now you can go to the home ribbon and click on this big close and load button this will close the power query editor and then brings the process data to Excel for us to analyze now that the data is here in a nice tabular format with the table name as student I can use this to build either formulas or pivot tables so let's make a quick pivot table to see how much payments have been made by each of the courses and how many students have enrolled in each of the courses so I'm gonna click on the summarize with pivot table option from my table click OK and here let's put course into the row label area and then fee paid into values as well as email into values so we have 93 students analytics is our most popular program with 33 people followed by leadership design and technology and finally accounting and that's how much we have collected in terms of the fees you might be thinking what happens when the data changes so I'm gonna quickly save this file and open the original Source data file so this is the source data file let's go and take some of these part payments and change that to 500 for these three guys they're spread across that and then these two also 500. likewise if you want you can add some extra rows at the bottom and when you finish this job save this file close the DOT and then come back to the file where you have your power query data to update the process again you don't need to go into Power query you just right click and refresh it has refreshed and we can see that you know these 500s are now appearing if I go to my Pivot and refresh this I'll see my totals change this is an awesome way to work with data and set up a process so that it can run independently whenever your data changes I hope that gave you an idea of what power query is capable of we have barely scratched the surface of power query though it can do so much more if you want to learn a little bit more about pacquery check out my part query playlist that is shown on the screen here I also have a detailed power query course that I offer on my website if you want to learn more about that click on the card that is shown here I'll catch you in one of these two places bye
Info
Channel: Chandoo
Views: 283,604
Rating: undefined out of 5
Keywords: chandoo, chandoo.org, Excel, spreadsheets, what is power query, most important data skills to focus on, what are the important data skills I should learn, how to use power query, How is power query used in business, power query and data analysis - examples, top technologies to learn for future, trending technologies in 2023, chandoo.org power query
Id: UAFExySaSPY
Channel Id: undefined
Length: 18min 44sec (1124 seconds)
Published: Tue Jan 10 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.