Highline Excel 2016 Class 18: Clean & Transform Data: Replace, Flash Fill, Text To Columns, Formulas

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
Welcome to Highline Excel 2016 class video number 18. OK, if you want to download this file-- Busn218-Video18 or the PDF file-- click on the link below the video. Hey, we got to talk more about cleaning and transforming data. Now, back in video number 3, we already saw how to use Power Query to clean, transform, and import data. But sometimes we don't need Power Query because Find and Replace, Flash Fill, text-to-column, and formulas can do the job more efficiently. Now, for example, Find and Replace-- if it's a simple character, we want to use Find and Replace-- not Power Query. Flash Fill up-- we have a simple data cleaning task, Flask Fill is amazing. Text-to-columns has a few amazing efficient examples of when we should use text-to-columns, and not Power Query. And of course, formulas-- the hallmark of formulas is if you want the solution to update instantly, formulas are your ticket. Now, let's go over to this sheet Replace and see our first example. Now here we have some data we need to clean. Here is product codes our system doesn't use a dash it uses a forward slash. So we simply want to find all the dashes and replace them with forward slash. We can highlight to isolate this range. Home, Editing, Find and Select-- there it is, Find. There it is, Replace. Now, the keyboard for Find is Control-F. If you only memorize that-- because Control-F works in all systems, including web pages and Word. But if you remember Control-F, you can always click back and find Replace. Now, of course, there is a keyboard for Replace. It is Control-H. Now find what? I'm going to type a dash, tab, forward slash. And I could say, Find Next. You might want to do that if there were dashes, sometimes you didn't want to replace. But in general, we're using this feature because we got this output from a different system. We know there's always dashes. So we simply click Replace All. And instantly, it says replaced all 14. Click OK. Click Close. Now over here, I want to try this with Flash Fill, which is another way we could solve this problem. In our prerequisite class we used Flash Fill. We haven't seen it in this class yet. But it is amazing. I'm going to type Carlota and then simply type my forward slash 3039 Enter. And as soon as I start to type the next example, if there is a consistent pattern Flash Fill gives us this ghost list with suggestions. As soon as I hit Enter-- boom! The column is filled. Now, would we use Flash Fill or Find and Replace? In general, if it's a simple single character like this, I think I could use Find and Replace more quickly than I could type everything out and use Flash Fill. But both will work. Now I'm going to delete this. We can also use Find and Replace, Control-H, with not single characters but multiple characters. So if I wanted to find the dash and I wanted to replace it with space, forward slash, space-- now, I can click Replace All. 14 replacements. Click Close and boom! There it is. So Find and Replace, what does it do? It replaces one or more characters with a new set of one or more characters. When to use it? When you have a simple set of characters you want to replace. Now let's go over to the sheet Flash Fill. We'll talk about Flash Fill. What does Flash Fill do? It can clean or transform data based on an example you give it next to the data set. When do you use it? When you have a quick one-time data cleaning task. For example, here's first and last names. If we simply want first-- I'm going to type M-O Enter. As soon as I type a G, I see that ghost Flash Fill list. Enter. And there, I have extracted first name. Now how does it do that? Microsoft calls it program by example. That means behind the scenes, as soon as we gave it an example, it built a little program to extract first name. Now the reason that this worked so easily is because there was a simple pattern over here, a space separated the two words in every single instance. Now I'm going to scroll down here. Here's a similar example, but notice we want to get the initial for the first name-- so S, J, G, and the last name Ripper, Mac. But notice we don't have a consistent pattern. So in order to get Flash Fill to work, you have to do two things. First is you have to be familiar with your data in the column. And you have to give it enough examples for it to learn the pattern. Now, notice our situation is 1, 2, 3 words with 2 spaces, 3 words with 2 spaces, 2 words with 1 space. So this is a small column. So we can visually see that there are only two situations. If you had a huge column you would have to be familiar with what data is in the column. Now here, I'm going to give it S-- first letter of the first name. And then Ripper, the last name. And I'm going to choose to go down two, because there's only two examples. I have to give it an example for both. So G and then Tran, Enter. Now notice there's no ghost list. So we're going to have to force Flash Fill to look to the left-- look at our examples by going to Data ribbon tab. And there it is, Flash Fill or there's the keyboard Control-E. And just like that, we have extracted first letter of the first name and last name. We knew the data. We gave it enough examples. Flash Fill works like a charm. Now, what if we wanted the last name and change the case? Well this one's easy, right? Jim. Enter. And I'm just going to use Control-E. What if we wanted the initials? M space D and I'm going to assume that it will guess it right-- Enter Control-E. Whoa! That's one drawback with Flash Fill, it is trying to build a program behind the scenes and sometimes it gets it wrong. Control-Z-- Delete. Let's try this again with a ghost list. Capital M space D, Enter. Capital G, oh, look at that. The ghost list got it wrong again. Space, but now I'm going to insist that we use T. And notice, it's got that R there. I'm going to delete it. Now when I hit Enter, I'm pretty sure if I Control-E that will work. Sometimes you have to mess around with it. Now here we want to reverse the names, insert a comma and a space, and change the case. So I'm going to type Dim comma space M-O Enter. And I'm going to try Control-E. And sure enough, that works. So if you have a simple situation, Flash Fill is amazing. And here's another example, if we wanted to insert parentheses and a dash for our phone number, not using custom number formatting or something like that. I can simply type open parentheses 206 close parentheses space 587 dash 4545 Enter Control-E. It should get that one right, and it does. Now here's another situation similar to this one. If we want the last color-- if we were to just type red Enter Control-E. Of course, it would assume we wanted the first one. So we need to give it two examples. I'm going to give it red. And then green. And sure enough, Control-E, it should get it right. We gave it enough examples, meaning red green. It knew what to do, which was get everything after the dash. Another example of when Flash Fill is just awesome. We have asset ID and in order to import this into the system it requires a lead apostrophe so that it knows that its text. I'm going to type a lead apostrophe 90513 Enter. And this is simple enough. Control-E. And sure enough, if you click in every cell, you can see it's got that lead apostrophe. Another example of where Flash Fill is better than anything else. We're not using Power Query or formulas. This is simply something I dumped and all I want is the number. It's consistent enough. I type 1574 Enter Control-E. And boom! There it is. All of the views extracted. Now let's go look at an example where Flash Fill might not be so good. Here we have some comma separated data. The delimiter is a comma. I have 1, 2, 3 fields. If I wanted to separate this out into three separate columns, I could try to use Flash Fill. Carlota Enter. I'm going to Control-E because this one's easy. This is, hey, get everything before the comma. But watch what happens for date. If I type 05 slash 05 slash 2016 Enter. If I were to Control-E, right here-- what in the world? Dates are known to have a problem with Flash Fill when it has leading zeros. Control-Z. The only way you can use Flash Fill that I know of in this situation is you'd actually have to highlight all of the cells, Control-1 and pre-apply the right number formatting. So Custom and I'd have to come here and do mm for months, with a leading zero slash, dd, leading zero for day, slash yyyy Enter. And now if I come here and Control-E, it knows what to do. That's a lot of hassle. If we were still using Flash Fill to try and separate this out, the region right here, if I typed west and Enter Control-E. Of course, that will work. And if this was our goal to get product, date, and region we could certainly do it that way. Our next example, we're going to learn there's a feature called text-to-columns that is much better for this particular situation. All right, so Flash Fill-- we definitely want to use Flash Fill to clean. Not often is it going to transform data sets, but the cleaning part-- absolutely. And it's a quick one time cleaning task. We don't need it linked like we could do with formulas or Power Query. Now let's go over to text-to-columns. Now what does text-to-columns do? It splits a single column of text into multiple columns based on a delimiter or fixed width. Now fixed width, we almost always can use Flash Fill. Like for example, extracting state abbreviation-- we just type those two letters, Control-E, and we're done. But delimiter-- we just saw in our last example how Flash Fill had trouble with splitting the data into multiple columns. Ah, but text-to-columns does two things that nothing else does better. It can take text dates such as ISO dates and convert them to serial numbers more quickly than any other feature. And text-to-columns can take text numbers and convert them back to numbers faster than any other feature. And when to use? Well Flash Fill for breaking apart certain data and Power Query for separating and cleaning mostly replaced text-to-columns. But as we mentioned up here, there's a couple of really good uses. Now, let's start over here with the example we just did with Flash Fill. Here's our column and we want to break it apart based on the delimite we have here, which is a dash. And we don't want the second column, which is the product. We want the department, the date, and the amount. Now text-to-columns, you highlight the column. We do not need a field name for this feature. Data and there it is text-to-columns. We can also use the keyboard Alt-A-E. And there's our dialog box for text-to-columns. It's been around for decades and it is amazing. Step 1 is always what is the delimiter? What is going to split the data? Either delimiter or fixed width. Delimiter. Now notice the preview down here. Let's click Next for step 2. Step 2 is just, what is your delimiter? Now notice the preview. I'm going to uncheck Tab and say Other and do a dash. And just like that, our preview shows that it's broken it apart into 1, 2, 3, 4. Now, one limitation for text-to-columns is we're only allowed one delimiter. If we use Power Query-- earlier in the class we've had a Power Query example, where our delimiter was multiple characters. In the old days, before Power Query, we'd have to run text-to-columns multiple times when we had multiple delimiters or do other tricks. But here, it's broken it apart perfect. So step 2 is delimiter. Next. Now, step 3 gives us the options of a couple things. We can format and we can tell it to skip certain columns. Now let's talk about this format. Look at the default, it's General. And you can read this General converts numeric values to numbers. And what they really meant to say was text numbers back to values, and date values to dates, which really should have been text dates or ISO dates back to serial number dates. That's why this feature is so amazing. The default is set up perfectly to handle text, dates, and numbers. The other thing is when we get to our ISO data is we can actually use this feature and tell it what form of date we have. Now for us, we're just going to leave it as General. We want to come down here. And for the second column-- notice we can click on whichever columns we want and change the formatting. We say Skip for our second column. And there it is, it says skip column. Now we're ready to go because this General works perfect for our three text, date, and number columns. Now the other part to step 3 is very important. If you want to replace it like with ISO dates then that's the default. It actually says I will replace everything in this column, but that's not what we want. Notice the collapse box. I'm going to click inside here and very carefully click-- and I've already put the field names, right? I'm going to click right below Department in cell L16. When I click Finish, text-to-columns that is beautiful. Now remember, we could have done this with Power Query. But if you have a column and all you want to do is split it apart and you don't need a link to the source data, text-to-columns is simply amazing. Here are the other two great uses for text-to-columns. I'm going to highlight this whole column of ISO dates-- year, month, day Alt-A-E. I don't have a delimiter so I don't even need to worry about step 1. I still don't have a delimiter so I don't need to worry about step 2. It's only step 3, and here's where I go to date and I'm going to select year, month, day. I'm definitely going to keep the destination as the top cell in the column I highlighted. That means it will replace those dates when I click Finish. Boom! Just like that it has converted those. Now, here's text numbers. And the reason that there is nothing that can be text-to-columns is because if we know the keyboard, Alt-A-E-- and I'm going to do it slow this first time. Remember, we don't have a delimiter here. We're just going to use that step 3. So I don't need this step. I don't need this step. And remember, General is the default, which will convert text numbers to proper numbers. So watch this. Escape Alt-A-E-- Notice that Alt-F will get us to finish. And the funny thing here is that in order to get to that underline F, we have to hit Alt-F a second time. So you're ready? It's Alt-A-E Alt-F and done. So text-to-columns pretty amazing! If we want to split a single column and we don't need it linked or we need to take text, dates, or numbers and convert them back to proper dates and proper numbers, text-to-columns rule. Now I want to go over and talk about formulas. And I actually want to start on formulas, too. Because we've done a lot already in this class. In video number 8, we talked about text formulas and text functions. And in video 11, we talked about lookup functions. In both of these videos we saw extensive examples of formulas to convert data sets. Now we had all sorts of examples in video number 8, of extracting state with left, zip with right, first name with left in search, and a bunch of other examples. Including the ever important Trim function for cleaning data. Each individual example is an example of cleaning data. Together we, in essence, transform this data set into a data set that we can use for data analysis. We also very importantly saw lookup functions. Now lookup was one of our functions. But we saw many functions-- VLOOKUP, HLOOKUP, straight lookup. But we are converting this data set into a different data set with expanded columns. We had looked up the price, we calculated the revenue-- just that alone is transforming our data set into something we can use, for example, for a pivot table. We also saw an example where we use index and match to flip a table upside down, which is another example of transforming an original data set into some new data set with formulas. But now, for our example in this video, I want to go over to formula 1. What formulas can do? Well they can clean or transform data sets. When to use them? When the data is in Excel already. We don't need to import it. If we needed to import it, we'd be using Power Query. We don't have big data. You don't want to use formulas on very large ranges of cells because then calculation time slow down. Again, what would we use? We'd use Power Query and PowerPivot data model when we have big data. And when do we use formulas? The hallmark for formulas is when you want the solution to automatically update without having to use the Refresh button. So here's an example. We have a template for grading, and we always just dump our student names right into the first column. And I want the formula to automatically get first and last. And then I enter my grades and so on. Here's a perfect example, where I want to, in essence, clean the data, extract first and last, and have it always automatically update. Well the first is easy. We simply say, hey, Left-- I'm going to look at that text right there. The pattern for first name is easy, it's everything before the first space. So for comma, number of characters were getting from the left, I need to Search for a space in double quotes within that cell. Remember back to video 8, Search tells us the position of that space. So counting from the left, it would tell us the space is in the sixth position. That's not how many characters we want from the left, we want one less than wherever the space is. So I subtract 1 close parentheses. And that's our formula that will always work for first name. Now a last name, when we do not have a consistent pattern is a pretty tricky formula. Because notice, two words with a space but sometimes we get one, two, three words with two spaces. Now one option is to use this huge formula. And it's a fine formula, lots of people use it for this. I'm going to show you a slightly different version. Now this is going to be conceptually kind of sideways and unusual. But it's one of the more efficient formulas and you can see it down here. It's one of the more efficient formulas I've seen for exactly this situation, where we sometimes have a middle, sometimes don't have a middle. Now, the first thing we're going to do is we're going to use this substitute function to find the space and insert a whole bunch of spaces. So I'm going to use substitute. The text I'm looking through is that, comma, the old text is a space in double quotes, comma. And now the new text-- the length of the text in the cell, but all as spaces. So I'm going to use the Repeat function. The Repeat function can repeat a single space, however many times you tell it. We're going to tell it, please look at the length of this. So it'll always count exactly how many characters that there are, that's what the length does. The repeat will then repeat that space. And I'm going to hit the F9 key just to look. Look at that! So now substitute will find a space in here and always put a count of spaces that is exactly equal to the length. Now I'm going to Control-Z. When we close parentheses Control-Enter, we're going to get something ridiculous. Double click and send it down. I'm going to the second one, F2 highlight F9. You could see that's ridiculous! Why do we do that? Well, now we can go from the right. And if I say Right, the length of whatever that is, it will always end up somewhere in this sea of spaces. And then it will have Pham with a bunch of extra spaces and we can use the Trim function. So Control-Z, right of all of that text right there, comma-- how many from the right? L-E-N. LEN close parentheses close parentheses Control-Enter double click and send it down. I actually forgot this. I should go up one. And now I have the last name every single time. F2, I just need to give it a haircut with the Trim function. No, no the trim function removes all the extra spaces-- close parentheses Control-Enter double click and send it down. That is a pretty wild formula. Now if you count the number of functions this is six functions. This one over here is seven functions. I kind of like this one. But there it is. There's an example of formulas to clean our data. And were using the formulas because we always want it to update. Next time we dump our names here, instantly these will update. All right, in this video we talked about cleaning mostly and a little bit about transforming with either formulas, in particular when we want everything to automatically update. We saw text-to-columns, in particular when we have text dates or text numbers or we're splitting data's based on a delimiter. We don't need to link like in Power Query. Flash Fill when we're cleaning data. And Replace when we have a single character in a column or a range that we need to replace. All right, now next video we're going to talk about the amazing Advanced Filter feature for transforming data sets. All right, we'll see you next video.
Info
Channel: ExcelIsFun
Views: 21,802
Rating: 4.9615383 out of 5
Keywords: Highline College, Excel 2016, Mike Girvin, Busn 218, Spreadsheet Construction, Highline Excel 2016 Class, excelisfun, Intermediate Excel, Advanced Excel, Basic To Advanced Excel, Clean & Transform Data, Replace feature, Find and Replace, Flash Fill, Text To Columns, Formula for Last Name when some names have middle name and some do not, SUBSTITUTE, REPT, LEN, split data apart into multiple columns based on a delimiter, convert ISO Text Dates to Serial Number Dates, Right
Id: g3nN1P-vxiE
Channel Id: undefined
Length: 24min 14sec (1454 seconds)
Published: Fri Jun 03 2016
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.