Basic Excel Business Analytics #27: Clean & Transform Data: Formulas, Flash Fill, Power Query, TTC

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
Welcome to BI 348 class video number 27. Hey, if you want to download this file, BI 348 chapter 2.5 Excel Data Transformation Methods, the Excel file, or you want to download the PowerPoints, click on the link below the video. Hey, this chapter is not in the textbook, and it is-- well, wait a second. What's this whole series? It's business analytics. And what do we always start with when we're doing analytics? Well, we almost always start with some data set that may or may not be in a proper data set, and it may or may not be in Excel. So of course, one of the most important topics is learning how to import clean and transformed data so we get it into Excel and it is in a form that we can use to do our analytics. Now, in our PowerPoints, I'm going over just a few slides. I want to jump over to Excel. Hey, our topics-- we're going to look at Excel built-in features to clean and transform data, like formulas, the amazing flash fill text to columns, and power query. Then we'll talk about importing data. And for importing, cleaning, and transforming data, there's nothing more amazing and powerful than power query. And so that will be the bulk of this week. We'll just be using power query to get the right type of data so we can do whatever it is we'd like. And we'll briefly talk about data dashboards, and we'll introduce the data model in Excel 2013. Now, here's a list of all of the power query. Because besides this first video, all the rest of the videos are going to be topics about power query. And there they are. There's nine topics. Actually, there's 10, because the first one will involve two amazing examples. Now let's go over to Excel. And now, this video is just going to be how to use regular features, like formulas, text to columns, flash fills, and even power query to deal with data that you already have in your Excel workbook. Later videos will be all about how to go and get data from various sources in [INAUDIBLE] et cetera. Now, here's our first example. We're given a data set. It's a simple questionnaire that we've sent out-- where do you currently live, and where would you like to live? And if I Control, Down Arrow-- so 100, almost 150 answers. Now, here's how the data set comes in. If I build a pivot table, it's not going to have labels that are very helpful for the viewer of the report. So here's our little legend. BC means Big City. C means City, R, Rural, S, Suburb, T, Town. And so we would really like to have these labels in our pivot table. No problem. I'm going to add some helper columns. And I did a little formatting there. And simply use-- well, hopefully when you see this situation here, some stuff-- here's a legend. This is what we really want. You should immediately think this is a lookup situation. So I'm simply going to type equals VL Tab. And our lookup value will be the particular letter. And notice that's a relative cell reference. When I copy down, it moves. When I copy it to the side, it moves over there. Comma, the table-- I'm only going to highlight the first two columns. F4 to lock it, comma, column index number. The second column has the thing I want to go and get and bring back to the cell. So I type a 2 comma. And although this column is sorted here, we could use approximate. I'm still going to use exact match just in case this is not sorted. 0 for false, close parentheses. Control, Enter, copy it over to the side, and then double click and send it down. Control, Down Arrow, and F2 just to make sure we got it right. It looks like it's working. Control, Home. Now I'm going to highlight just these two columns from my pivot table, Control, Shift, Down Arrow, Alt, N, V. And I want it on a new sheet, so I'm going to hit Enter. Now I'm going to drop where would you like to live down the rows. There's instantly a unique list. I drag it over to Values. Now it's counting. I come up to Design, Report Layout, Show in Tabular to get my field names, over to Analyze. And I want to, in the filter group, insert a slicer. And I'm going to say currently live, click OK, and there. Actually, let's do one better, really. We really should have percentages. I'm going to drag length to live down to Values, right click. Show Values has this amazing list here. Percent of column total so each one of these is expressed as a percentage of the column total. And there we have it. When I say big city for my filter, it looks like 33% say I want to stay where I am. 24% say I want to go to a town. I click on City. 44-- wow-- in the city. That's a big number. And you can see the percentages for each. Now I'm going to come down here and double click this sheet and call this something like survey report and Enter. So we talked about V lookup-- very simple. We used a formula to transform our data into something more useful. Now let's go over to spaces. And here's a common problem. Now, I made an exaggerated version here where there's all sorts of extra spaces. And if you didn't know this and you built a pivot table and you dropped that field into the row areas, you would get lots of different Gigi pretties. Because look, there's-- oh my heavens, a bunch of spaces there. There's some spaces from the beginning. So sometimes, you have data, and you need to remove extra spaces. Now, what is the definition of extra spaces? Because look-- right here, it looks like, wow, I have a few spaces between first and last name. Extra spaces are any spaces except for a single space between words. Now, one way to do this is to simply use the Trim function. No, it doesn't give you a haircut. No, it doesn't put you on a diet. It simply removes extra spaces. I'm going to click on the two cells to my left, Control, Enter, Double Click, and send it down. Click in the top cell, Control Shift, Down Arrow-- watch this. Right click, and I'm going to hit the Format painter. Scroll all the way up to the top, and then paste just the formatting. Clean Sales Rep will be our field name. Now I'm going to click on the D and point to the edge with my move cursor and click and drag. Because certainly, that is one awesome option. But watch this. I'm going to copy and paste here and then delete the formula. Because here's a formula. And I better add some green here. And the advantage to formulas is simply that they update. If that data changes, this will automatically update. Not only that, but in this situation, sometimes I think I can type trim faster than I could do some other feature. But again, formulas are not the only way. And in fact, this is probably the most amazing way. I simply type clean sales rep as my field name, and I'm going to use flash field. Now, a flash field only works in Excel 2013 or later. I'm going to type the correct name-- Battalion, space, just a single space, and pretty. Now, what I'm doing if I'm using Flash Fill-- and in fact, any time you type next to a dataset like this, Flash Fill will try to interpret what you type as an example of the correct data type. And Microsoft calls this programming by example. Because Excel will try to, behind the scenes, build a little program to fix the data. But watch this. When I hit Enter-- and now I'm going to type Gigi-- instantly, Flash Fill gives me this ghost list. And because I typed a name that exists right there with no spaces, with a single example, it guessed, and it looks like it's correct. And when I hit Enter, that is amazing. Now, Flash Fill doesn't always work with a single example. Sometimes, you have to give it two or three examples. Control asterisk, and I'm going to add a border. Now, a lot of the examples we do in this particular section, I'm not going to build reports. We're just going to see how to clean the data. So very clearly, there are two examples. Now let's go over to ISO dates. Now, oftentimes, you get dates in this form-- year, month, day. And we need to get them to proper serial number dates. Now, there's a few ways we can do this. There are a lot of different formulas that will do this, and I'm going to show you my favorite one here. Now, this is not the fastest method. However, again, why do we have formulas? In case this data changes, the formula will instantly update. And we're actually going to use the text function. Now, the text function takes numbers or numbers that are stored as text, and with a custom number format, will convert it to that custom number format as a text item. Now, I'm going to say hey, look over there, comma. And the number formatting we're going to use-- and this is really kind of a hack. We're going to use a number formatting that is usually only use for Social Security. I'm going to say 0000, because there are four digits over here that represent year, dash 00, for the month, dash 00. Those are for the days. Now, of course, the text function converts it to text. And when I enter this, that is text, and that is not going to help us. But F2-- any time we have a number stored as text, we can do any math operation on it, and it will instantly convert it back to a number in the math operation. I'm going to use this plus 0. And check that out. That's the serial number. Control 1, and now we can apply whatever date number formatting we want. You've got to be kidding me. Now I'm going to actually move this over here for a second, add some green, double click and send it down. Highlight that whole column, and then move it over a little bit. Control, Up Arrow. And we could go to any one of these and see, sure enough, it has given us exactly the right date. Now, again, that's not the fastest way. Here is the fastest way. And I'm not going to highlight the text. I'm going to highlight the actual date. Control, Shift, Down Arrow. And it is text to columns. And this has been around forever-- if data ribbon, text to columns. It doesn't matter. We're just going to go Next, Next. Because here's the magic button to click. We want to say ours are stored as year, month, and day. Now, notice it says A2. And oftentimes, you absolutely don't want those there. So you'd simply click Finish, and it would dump all of the serial number dates into this column, replacing it. But just to illustrate, I'm going to paste it off to the side, like in E2. And when I click Finish, you gotta be kidding me-- text to columns. So if you're just in a hurry, that is the fastest way. And text to columns goes way back. Now, a more recent way that we can do this is power query. And again, I wouldn't use power query if I just have this data set and I needed to do it quickly. But I want to show you that power query will do it pretty easily. Because oftentimes, when you're importing data, right from the outset before you even get the data into Excel, the dates are coming like this. So power query has no problem. And actually, it's almost as easy as text to column. So watch this. I'm going to highlight that, and I'm actually going to copy it over to the side, because you cannot use power query to get data from Excel unless it's an XL, which means you have to go insert and convert it to a table or Control T. And we'll actually do this in an example coming up. Or you've actually at least one time turned on the filter. Now, watch this. See it says Control, Shift, L. I'm just going to Control, Shift, L. And watch this. I'm going to turn it off. So my trick-- if I'm using power query and I don't need Excel table feature which would create dynamic ranges, I just want to quickly do something in power query. I just Control, Shift, L, L. For that flash of a second, that whole dataset got turned into filter. So watch this. When we go up to power query, we're going to use from table. Because Excel data almost always comes from a Excel table feature, so the ranges are dynamic. But no problem. We use this button, and it will respect that filter. And there they are right there. But notice over here, it took the actual sheet name and said filter database. That's pretty cool. I'm going to leave it like that. All you have to do in power query is click on the column, transform. And data type, I want to say date. And you gotta be kidding me. That is amazing. Now we have to close and load 2. And we're going to do this a lot in this chapter- close and load 2. I'm selecting the bottom one. Close and load just puts it as an Excel table on a new sheet. I'm going to say close and load, because I want to choose Table, and then I can choose whether to put it on existing sheet or a new worksheet. Later, we'll see the amazing only create a connection. Existing worksheet-- I'm going to select, click that button there, and I want to put it in J1. Click OK. Click Load. And there we have our workbook query. I'm going to scroll over, and there are our dates. And now I'm going to close this, control and roll. Clearly, there are multiple ways-- formula if you want it, dynamic, that great-- in essence, a social security-type custom number formatting plus zero. Text to columns is the fastest, the absolute coolest. And then, of course, power query. Now I want to go over to the sheet-- count yes and no. Now, here we have a field that's city and region. And we really want a city here. Formula, Flash Fill, Text to Columns-- they all can do it. Now, if you wanted to do it with a formula, notice that it looks like the first eight characters is Oakland. The first four characters are Reno. So from the left, we need eight characters and then four. Well, we could use the left function, but we need a dynamic number of characters. Search function is the perfect function for this. Equals s, and there's a search and a find that will both search for a particular character, like comma, and tell you the position. Search happens to not be case sensitive. So I'm searching for finding the text in double quotes. I always want to find a comma. End double quotes. Comma within-- boom, that relative cell reference. Close parentheses. When I hit Control, Enter, it tells me it found comma at 8. Double click and send it down. Right here, it found comma at 5. Now we can use that formula result inside of the left, because the left will take characters from the left-- 8, 5. Oh, wait a second. We need to subtract one, the active cell at the top. I'm going to hit F2 and amend this formula, because I don't really want 8. I want minus 1. Now I'm going to populate this edited formula all the way down with Control and Enter. And now we have exactly from the left 7, 4. That will work. F2, and right at the beginning, after the equal signs, left. It wants the text. Boom. Comma. There's the number of characters. I come to the end. Close parentheses. Control, Enter to populate that all the way down. Now, formulas are much more complicated than either one of these methods. But again, that's dynamic. Flash Fill. Oh my heavens. Watch this. Oakland, Enter. As soon as I type in an R, my ghost Flash Fill list comes up. I have given it two examples-- a full give me everything before the comma, and then as soon as I type an r, it's guessing that I want everything before the comma. And when I hit Enter, that is amazing. We can also use text to columns, because remember, Flash Fill is only in Excel 2013 or later. Control, Shift, Down Arrow. And I'm going to use the keyboard here for text to columns in 2007 and later. Alt, A, E. Delimited, we definitely have a comma which separates the two fields in [INAUDIBLE]. I'm going to click Next. We do not want Tab. We want to say comma. And look at that. Now we're not going to need this second column here. So when we go Next, I can simply click on the region. Do not import. Not only that, but if I replaced it right there-- I'm not going to replace it. I'm going to put it in Destination M2. And when I click Finish, exactly the same thing as Flash Fill and R formulas. Earlier versions, you want a quick and easy way, text to columns. Flash fill 2013 or later, and dynamic, use that left and search. Now we want to go over and look at a much more complicated situation. I want to go over to Description. And here it is. We got a data dump into our spreadsheet that has only the description from the transaction. And it looks like we have the department or category, the actual product. Then we have a date, and we have a sales number. And we need to break this apart. Now, we can definitely do this with formulas, but it's going to be pretty complicated. The real answer here, if you have power query, is to use power query. However, we do want to look at formulas to get the date and the number. It's important that we are familiar with text functions. Now, in the prerequisite for this class, Business 218, if I go over to the Note sheet, we covered all these different text functions for extracting data. These are also over in the PowerPoints. Now let's try and get at the date first. I added some field names and some formatting. Now, if we're going to get the date, notice the pattern is definitely dash dash. So if I were to use the Search function and look for this, I might get that. But you know what? I'm actually going to use the substitute function and find the second occurrence of this dash and get rid of it. That'll make the actual text easier to deal with. So I'm going to look at the substitute. And all it does is you say, hey, what the text is, comma, what the old text is that you're searching for, end double quotes, then dash, comma, the new text is going to be double quote, double quote-- that's the syntax for nothing-- inside of substitute, comma. And then look at that. Not all text functions give you the options for instance number-- instance 1, and instance 2. So I'm going to put 2, close parentheses, Control, Enter. I'm actually going to widen this. And then double click and send it down. And you can see, sure enough, it's missing that second dash. Now I'm going to amend this, because I want to use the mid function. And the mid function lets us take from the middle part of a text string. I'm going to use the mid, and then it's going to ask me where is the starting position. And I'll use Search to find that. And then guess what? The date is always going to be 10 characters long. So F2, come here, MID function. The text-- that substitute is our new text, comma. I need to find the dash. And notice the actual dash is in the same location in the original and in our new text. So the start number, S Tab, and for find text for search, I'm going to put in double quotes a dash, comma, within this text right here. Notice again, we're allowed to do that, because the dash is in both text strings in exactly the same location. Close parentheses. If we were having a start number, dash is there. We're going to need to add 1 and 2 to get to the 0, plus 2, comma. Number of characters-- it's always going to be 10. I think it's 10. Yeah. That's 10. So 10, close parentheses, Control, Enter, double click and send it down. We can see it's text, so I'm going to F2, plus 0. Any math operation will convert it back to a number. Control, Enter to populate that edited formula all the way down. Control 1 to open up Format Cells, Date, click OK. Wow. That was pretty complicated. Now, Sales-- notice it's always going to be at the end. But really, we can search for that dollar sign since it's the only one here. And we're going to use not substitute, but replace. Now, replace is slightly different. We'll give it that text. It needs a start number and number of characters. Now, the start number, I'm always going to start at the beginning. So I'm going to type just a 1, comma, and this number of characters, so from 1 all the way up to the dollar sign. So search, and I'm searching for a dollar sign in double quotes, comma, within this text right here. And what this will do-- number of characters-- it will say from 1 all the way up to that, whatever position that is, and we're going to take that increment from 1 to whatever and replace it with double quote, double quote. That will be the syntax we're using inside of replace to say don't put anything. Close parentheses. Control, Enter. Wow. That's pretty cool. And notice we always have our default alignment to the left as a visual cue immediately saying that's text. So F2 plus 0 Control, Enter, double click, and send it down. Control, 1, and I'm going to do something like currency. Click OK. Now, that is definitely quite complicated. And the only time we'd want to do something like that or something like this is when we want it to be dynamic. Now we want to see the power query. I'm going to click on the top cell, Control Shift, Down Arrow, Control, C, Arrow, Arrow, Arrow right here, Control, V. Because remember, if we're going to use power query, this needs to be either a filtered data set or a table. I'm in a Control, T to convert it to a table and hit Enter to get that OK button. I absolutely want to go design and up to Properties. I want to name this Or. If you do this a lot and you're using power query, ALT, J, T, A will get you right up to the properties. Give me a name, and I'm going to call this Descriptions, and Enter. Now I can go up to Power Query from table. And there is our descriptions name. It took for our query name. We're going to click on this. And hey, I think pretty much, we can use home split column. We have-- it looks like a colon and a space delimiter. We've got some dashes. So I think this is going to be quite easy. We'll say split by delimiter. And for the first one, we're going to get at that colon, so I'm going to click the drop down and point to Custom. Right here, I'm going to say Colon, Space. And certainly, this is pretty cool, different than text to columns. We have at leftmost delimiter, at rightmost, and each in currents . We only have one, so I simply need to put-- actually, let's make them a stake here. Watch this. This is another amazing feature about PowerPoint. I'm going to put just a Colon, click OK, and it will obey me and split it. But whoa. There's an extra space there. And here's something awesome. I don't know how many times I've used this learning power query. You can simply x this out, x this out. And notice it did two steps. And it will totally remove those and get us back. So now when I click the column, Split, Delimeter, Drop Down, Custom, colon, and a space-- I'm putting two in there-- when I click OK, that is beautiful. No extra spaces there. Now let's click on this column. And I want to split it with a space, dash, space. And notice the date and the sales amount are separated by the same series of delimiters. Split by Delimiter, Custom, space, dash, space. And when I click OK, boom. There it is, split apart. Now, we want to make sure that each one of these has the correct data type. So data is usually the one that gets us in trouble. Those are both text. Now we can double click and name the columns. Department, double click. Product, double click. Date, double click. Sales, and Enter. Now we have our steps. We have our name. Close and load to table. We're going to dump it into our spreadsheet. Later, we'll see this awesome option here for data model. We're going to put it on this existing sheet and click the Collapse button. And we'll put it somewhere like G1. Click OK. Click Load. And sure enough, there it is-- 68,000 records split apart into our four fields. Power Query certainly is easier. And we want to check out the fact Power Query is dynamic also. Now, this dataset over here-- if I come over and change this first transaction to 1,000 to 2,000, when I hit Enter, formulas are instantaneous. There we go. Now, over here, if we change this to 2 2, oh, it doesn't update. No, no. It's just not instantaneous. You have to come and right click Refresh. And it actually goes through all of the steps of taking the data, transforming it, and bringing it over Here. Now, actually, this is sort of a silly example, because the only time you're going to get a data set like this is from a data dump where this is the description from a transaction or something like that. But nevertheless, we saw how to, with formulas, extract data from, in this case, the middle and end, and then we saw how to take the same exact descriptions. Power query made it a lot easier to transform all of the columns. So in this video, this first video for this cleaning and transforming data section, we saw how to do a power query and some formulas on a description. We saw back on the count yes and no. We saw how to do formulas, Flash Fill, and text to columns to get just the city from City and Region. Back on I SO dates, we saw how to use text to columns, formulas, and Power Query, noting that if it's a one-time deal, text to columns is pretty fast and easy. We saw how to use Flash Fill and the Trim function to get rid of extra spaces. And then back over here on unhelpful labels, sometimes we get data, and if we make a pivot table, those are not good labels. So we simply used V Lookup to help transform our data. Now, next video is going to be our first of many Power Query videos about importing, cleaning, and transforming data. We'll see you next video.
Info
Channel: ExcelIsFun
Views: 53,195
Rating: 4.9170985 out of 5
Keywords: Excel, Microsoft Excel, Highline College, Mike Girvin, excelisfun, Michael Girvin, Mike excelisfun Girvin, Slaying Excel Dragons, Excel Magic Tricks, Ctrl Shift Enter Mastering Excel Array Formulas, Array Formulas, Business Analytics, BI 348, Data Analysis, Clean Excel data, Transform Excel Data, Formulas, Flash Fill, Power Query, Text To Columns, VLOOKUP, TEXT function, REPLACE function, SUBSTITUTE unction, SEARCH function, Split Region and City from Same Cell, ISO Dates
Id: qE09ZtiaZl4
Channel Id: undefined
Length: 29min 48sec (1788 seconds)
Published: Sun Oct 18 2015
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.