E-DAB 02: Data, Proper Data Sets, Excel Tables, Logical Tests, More

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
Welcome to E-DAB video number two, Data Analysis and Business Intelligence Made Easy with Excel Power Tools. And in this video, we got to talk about data, tables, sorting, filtering, and the ever important logical tests. Now, for every video in this class, you can download the file and notice it says start. That means the file we're going to use when we start the video, there is also the same file name, but it says finished. That's the file you can download as an example of what it looks like after we've completed the video. Now, here are our topics for this video. Also, for every video in this class, these objectives are always below the video. This is Excel Basics 1, not our current video. But if you look below our current video, you'll find this show more button. When you click, there's a time hyperlink table of contents. And we want to start off by going to the sheet Excel. So I click on the sheet Excel. Now, just in case some of you joining this class have never used Excel, we're going to start off with just a brief introduction to what is Excel. This section of the video will also help us understand how Excel interprets the raw data we enter into an Excel spreadsheet for our data analysis. Now, the essence of Excel is that there is a column. This is column I. There is a row. This is row 4. And the intersection is called a cell. That's the cell I4. Now, cells will allow us to hold data, make formulas, pivot tables, and things like that. Now, all of the cells together, that's called the worksheet. Now look down here. There's a bunch of different worksheets, also known as sheets. I can click with my mouse on data, data or information, proper data sets. Each one of these is a new worksheet. Now, all of the worksheets together, that's called the workbook. Now, the workbook or file name is up in the title bar. Now, navigating through these sheets, we can certainly use our mouse and click. We can also use this scroll arrow, click, click, and notice it moves the sheets, but not the active sheet. I can click to scroll back. But here's the cooler trick, and if you hover your cursor, you can see, it has two helpful hints. You can use control left click to scroll to the last sheet, or watch what happens when we right click. That is amazing. And because we'll have workbooks with lots of sheets, this is very handy. If I want to jump to homework problem number one, I click. Click OK. And the active sheet is now homework one. Right click, scroll up. Excel, click OK. Now, if we're going to start putting things into the cells, we got to talk about the amazing default alignment for data in Excel. Text is always aligned to the left. Numbers, dates and times are always aligned to the right. Boolean values, trues and falses, are always centered and capitalized. Now, if I type Excel, as soon as I hit Enter, I know that Excel thinks that this is text, because it's aligned to the left. I type a number, Enter. Look at that, aligned to the right. I know the default alignment is signaling that Excel thinks this is a number. Now, this default alignment can be very helpful. Notice, numbers, dates and times, all have to be aligned to the right. If I were to enter 43..15 by mistake, a type in error, when I hit Enter, there's my visual cue that Excel does not think this is a number. That will not work in formulas, pivot tables, and so on. Dates and times are similar. If you don't enter them correctly, Excel and the other tools will not understand them as dates. So if by mistake, I type 18/2/2019, let's change the column width between I and J, hover my cursor, there's the right cursor to click and drag. My particular regional settings in my control panel expect month, day, and year. Now, some parts of the world, the regional setting, would accept this as a date. Time is similar. If I enter 8:00AM and hit Enter, there is my visual cue. Excel doesn't think that's a time value. Now, of course, the correct way to enter these is 43.15, Enter, 2/18/2019, Enter, and then 8:00 space AM. Now, watch what happens, just like these Boolean values, if I hit Enter, it automatically capitalizes. Also if you enter true or false, all lowercase it automatically assumes it's a Boolean, capitalizes, and centers it. Now you might be wondering, I thought we typed in 2019. What happened to the 20? Well, if you select the cell and look up to the formula bar, look at that. Something's being displayed on the surface of the spreadsheet that's different than the actual item in the cell. That's our next important topic, number formatting. If we go to the Home ribbon tab, number group, this is all number formatting. Now let's select cell K13, 43. And instead of using Enter, I'm going to use Control Enter, because I want to put the thing in the cell and keep the cell selected. Now, I immediately want to look up to the formula bar, OK, two digits, two digits, Enter, 43 Control Enter. Two digits, two digits. Now, I want to use number formatting to display this number with two decimals showing. So up in number format in group, once and twice. We accomplished our goal. We have displayed this number with a decimal and two zeros. If we look up in the formula bar, we can see the underlying number was not changed when we applied number formatting. As another example, if I type the decimal 0.0145, Control Enter, that's a tax rate, we would like to use a number formatting to display it as a percentage. So I select a percentage number formatting. Now if I decrease the decimals, it's displayed as 1%, but number formatting, none of this up here actually changes the underlying number. We can see up in the formula bar the 1.45 is still there. That number formatting didn't change the underlying number. Now, why is understanding number formatting as a facade, where the number formatting displays a number in a certain way without changing the underlying number, why is that so important? Because if we're dealing with this data, we have to understand that, yes that may be displayed as 1%, but it's the 1.45 that will be used in any calculation or data analysis. Now, that is misleading. So of course, when we're applying number formatting, we want to make sure that it is helpful, not misleading. Two other important number formatting topics. Anytime you enter a date or, time automatic date number formatting and time number formatting will be applied. That's so we can do date math, things like subtract two dates to figure out how many days between two dates. One last what is Excel topic. You can think of Excel is doing two main things, making calculations and performing data analysis. Here's an example F2 to put the cell in edit mode of someone created a formula to calculate net income. That's not what we're going to be doing in this class. We're going to be doing the second one, converting raw data into useful information. Now, we need to talk about data and how we store data. Let's go over to the sheet data. What is data? Raw data? It is data stored in its smallest form that allows Excel data analysis features and power tools to work. What that means is this is not good raw data. Why? Because if we needed to sort by zip code, this data has the full address in one cell, rather than broken apart into its smallest form. In this form, we can do things like sort by zip code easily. As another example of bad data, look at this. We have date, product, and sales all in a single cell or a single column here. This is bad data. With this data, we can't use the awesome built in features, like a pivot table, to add sales by product. What we need is to take this bad data and convert it into data we can use, data stored in its smallest form, date, product, and sales. Then, features like pivot table can deal with this data set, and in just a few clicks, we can have our product sales report. Now, the good news is this bad data. Well, that's not the good news. But Power Query, as a tool, that tool was invented to deal with just this situation. With Power Query, we can easily split this bad data into three columns. Now, let's go over to the sheet D or I, because we do not want to confuse the word data the word information. Data, that means the raw data. Information, that's what we convert the raw data into. Data is the unorganized raw data. Alone, it doesn't yield insight. Alone, it's not very useful. Information is organized and presented data. This helps people to make decisions, see patterns, and gain insight. Now it may seem obvious, but lots of people use these words loosely. Data, to us, is the raw data. Information is the final product. Our next topic is proper data sets. I'm going to click on the sheet proper data set. Here are the terms we would like to define. Down here, we have two examples of a proper data set. Now proper data set, the synonyms are table or table of data. In most tools, we just refer to them as a table. Now, the first most important thing about proper data sets is fields or column headers have to be in the first row. So here, we have a transaction table. But very importantly in the first row, I list the names of the column headers, or what are called field names, date, product, ID, units sold, and sales. Over here in our product table, product ID, product, flight range, and retail price. So every single proper data set or table has field names in the first row, fields, column headers, field names, columns, variables, and attributes. Those are all synonyms for what's in the first row. And if you think about it, they help us define the data type. We're organizing our data. Date goes in this column, product ID here, retail price here. Now, this table and this table are sitting in the Excel worksheet. Later, we'll have to import these proper data sets into either PowerQuery, PowerPivot or Power VI desktop. And in those tools, the data type has to be explicitly defined. Here in the Excel spreadsheet, I don't have to specifically say this is an integer, this as a decimal, this is text. But in those other tools we'll have to explicitly define the data type. The next important characteristic of a proper data set is we have rows. That row represents a sales transaction. This row represents a product. Rather than referring to these as rows, they're referred to as a record. This is a record of a transaction. This is our record for the particular product. So a proper data set has field names in the first row, records in subsequent rows. In all the other tools, this table will be sitting alone. However, in an Excel spreadsheet, notice, I'm allowed to put anything I want into any cell. So when you have your proper data set in an Excel worksheet, you have to remember to have empty cells all the way around. If you have any little bit of data, like some note right here, next to, touching the proper data set, that can create a problem when we're trying to create a pivot table or filtering. So we don't want anything around the data set. Now, the next term is delimiter. And I want to go back over to the sheet data. A delimiter is a particular character that separates bits of data. The delimiter here is a dash. So we clearly can see that there are three different things here, date, product name, and sales, separated by a delimiter. Later in the class, we'll learn about text files. Text files are how one system gets data to another system. And in those cases, the delimiter will be a comma, or a tab. The last important thing about proper data sets is naming. We want to name all tables, fields, and other items smartly and consistently. I do not want a name for this table, like table one or columns that say column 1, column 2, column 3. A smart name for a table like this would be sales transactions or F transactions, a smart name for this table, something like product or D product. On the sheet T not C, I want to make sure that we always call this proper data set a table and this visual portrayal of these numbers, a chart. Yes, that seems obvious. This is the table. This is a chart. But you would be amazed how many people call something like this a chart, not us, table, chart. Let's go over to the sheet, Excel tables. Here are two proper data sets stored in an Excel worksheet. Now, any time we have proper data sets in a worksheet, we want to convert the proper data set to an official Excel table. Now, some data analysis features will work when we have just a proper data set. But for example, if we go to the Data ribbon tab, the get and transform group is PowerQuery. We can't even get our data from an Excel worksheet into PowerQuery unless we first convert the proper data set to an Excel table. Similarly, PowerPivot we can't use this button add to the PowerPivot data model, unless they're official Excel tables. Now, let's scroll down. Here's our two Excel tables. There are four requirements for us to convert this proper data set to an Excel table. Now, the first requirement is that it is a proper data set. The second requirement is you click in one cell, then you go up to the Insert Ribbon tab, and then in the tables group, you click on the Table button. Notice there's a keyboard, Control T. So I have a single cell selected, Control T. This brings up the Create Table dialog box. Make sure it says my table has headers. Remember, headers are a synonym for field names. Click OK. The fourth requirement is that we go to the Table Tools Design ribbon tab. And by the way, if I click outside the table, that goes away, back inside the table, there it is. The fourth requirement is that we give our table a smart name. So in Properties, up in Table Names text box, type little f and then transactions. Now, one convention in data analysis is when we have what's called a fact table, the numbers in this transactional sales table are called facts, because those are the numbers we're going to use to make our calculations. So one convention is to always list our sales table or fact table with a little leading lower case f. Our second table, it is a proper data set. I click in a single cell, Control T. I can click Enter to invoke OK. And we go up to Design Properties. And here, since this is a lookup table, oftentimes in data analysis, we call this a dimension table. We're going to use the naming convention where we have little d, and then a smart name, like products. And Enter. Now, we have our two Excel tables. And the reason that we always want to take our proper data set from an Excel worksheet and convert it to an Excel table is because of dynamic ranges. That means if we add any records or take any records away, anything pointing to the Excel table object, we'll see those new records or deleted records and can be updated. That's different than if we just use a proper data set from an Excel worksheet. In that case, we'd have to manually go and change our solution to recognize the new data. Now, that's the reason that PowerQuery and PowerPivot require that we convert our proper data sets to Excel tables. As a simple example, let's create a formula, because it doesn't matter. It could be one of the power tools or anything else we might do, create a chart, create a simple formula. They all update when we add or delete records from our table. So I'm going to create a formula, equal sum, and because this is an Excel table, watch this. I can help her my cursor, and when I see my downward pointing black arrow, I click right above the field name, and look at that. It puts in our table formula nomenclature, the smartly named table, and our field name in square brackets. Close parentheses and Enter. Now, let's add a new record below this table, and see if our formula will recognize the new data. Now, I'm in the last cell in the last record. So I'm going to hit the Tab key to add a new record, date. Now, I want to add sunspot product 2005 Tab, and we sold 16 of them. Now there's our new record. Did our formula update? Yes, indeed it did. I click on the cell, hit F2, and sure enough, it still looks the same. But it totally expanded. These dynamic ranges inside an Excel table are like magic for us data analysts. Now we understand why when Microsoft invented PowerQuery and PowerPivot, they forced us to put our data, if it's coming from an Excel worksheet, into an Excel table. Now, Excel tables also add some formatting, and very conveniently, if you're going to work with the data in the worksheet, we can easily sort and filter, which we will see just in a moment. But I want to jump over to our PDF notes and look at some other examples of proper data sets. This is page 10 in our PDF notes. We want to talk about data types and proper data sets and the various tools we're going to use. Now, example number one. If we have a proper data set in an Excel worksheet, we can declare what type of data goes into each column by naming the column. But we don't have any way to force only dates or only text in a particular column. So we're allowed, in an Excel spreadsheet, to have dates and text in the same column. The problem is this inconsistent data will cause problems. For example, if we try to group dates in a pivot table, we're not allowed because there's inconsistent data. Remember, in data analysis, it's all about that raw data. The same thing will happen if we use an Excel table. If we have inconsistent data, it is going to cause problems. Just like if we have the data in the cells, in an Excel table we can name the columns, but we can't force a single data type on a column. If we bring that same data set into the PowerQuery editor, in PowerQuery, you are required to declare a data type, date, whole number, decimal. By declaring a data type, we are assuring that what we want and expect in this column is going to be there when we make a calculation. Here, because it was text, it gives us an error. That tells us we have to go fix the data or remove the error. If we take this data inside the Power Query editor and load it to the Excel sheet, then we'd get an empty cell, where that error is. And that empty cell will cause problems. If we bring that same table into PowerPivot, look what happens. Everything shows up as text. Now we're going to have big trouble, because in order for a date to be used in calculation, it needs to be a number. Now, in PowerPivot, just like PowerQuery, we'll have to explicitly declare what type of data is in a column. In Power BI Desktop we'll have to, just in PowerQuery and PowerPivot, define the exact data type for each column. Here, if we import that data, we're going to get all text. And of course, that will cause problems. So in all of our tools, when we have our proper data set, we're going to have to be careful about consistent data in each column. Two other proper data sets. Text files. This type of text file, where there's a tab delimeter between the field names and the bits of raw data for each record, this is a common file that we use to get data from one system to another. It is a proper data set, but it's all text. When we use Power Query to import this, we'll have to take the columns, define the data type, and create a proper data set that the analysis tools can use. Finally, it's common in data analysis and business intelligence to refer to the table that has the columns that we're going to use in calculations. It's common to refer to these as facts and refer to the whole table as a fact table. Also, the lookup tables-- these are tables with unique lists-- oftentimes columns from the lookup table, or dimension table are used as conditions or criteria for our calculations. So these types of tables are called dimension tables. Both of these are proper data sets. All right, in these PDF notes from page 10 to 13, we saw eight examples of proper data sets and talked about the importance of data types. Now let's jump back over to Excel and talk about sorting and filtering. Now, sorting is a feature that will be in all of our tools, and it can take a particular column and sort it alphabetically or numerically. Now, at the top of each column, the field name will have a dropdown. If I click that, I see A to Z and Z to A. Now, if you're sorting for the first time, the fear is always if I sort this column, will the records remain intact? And yes, in all of our tools, when we're using our proper data sets, when we sort-- and let's try sorting this A to Z, click the dropdown, A to Z, that record remains intact, as does all of the other records. Now we sorted a single column, and all the records sorted together. Oftentimes, we want to sort by more than one column. Now, if our goal is to sort sales within sales rep, that means as the final sort, I need alphabetically sales rep. And then for each sales rep, in the sales column, I need a mini sort, biggest to smallest. Now, sometimes the goal might be communicated differently. For example, you might hear, hey, sort the sales for each sales rep. Same thing. Now the key is that when we're sorting by two or more columns, whatever the final sort is, that's called the major sort. So for us, the major sort is sales rep. If we're using these little drop downs at the top, we always want to do the major sort last. So I come to sales, and let's do it largest to smallest. Instantly, every number, from biggest to smallest, but notice, once we jump over to sales rep and sort this A to Z, for example, Gigi's records are already in the perfect order from biggest to smallest. So when I come up to sales rep and do A to Z, just like that Gigi's records are together. But we have A to Z, and then within each sales rep, I have biggest to smallest, biggest to smallest, biggest to smallest. Now, the next topic we want to talk about is filtering, so let's go over to the sheet filter. Now here's our notes for filtering I'm going to scroll down. Now the goal of filtering is to take a proper data set and specify some conditions or criteria for extracting not all the records, but just some of the records. As our first example, our goal is to extract only the records for the sales rep Alma. That means we have to look through this column here, and only when Alma is the sales rep do we extract the record, just as for sorting at the top of the column, next to the field name there'll be a drop down. And this filter and sorting feature is in all the tools we use. Now, for filtering, there are some special filters that we'll look at in just a second. But there will always be a list of the unique items from the column that we're allowed to select from. So if I want to select just Alma, unselect all, select Alma. When I click OK, look at that, I have just the records for Alma. Now, in an Excel sheet, the rows that don't match the condition or criteria are hidden. Over in PowerQuery, they'll actually be removed. And if we go to Power BI Desktop and PowerPivot, we don't use filtering as much, except for as the secret trick to look in a particular column to see exactly what the unique list is. All right, so here in Excel, our goal is to copy and paste these. I'm going to click in a single cell, and I want to use the keyboard to highlight the entire table, control asterisk. Now, you can use the asterisk on the number pad or Control Shift 8. Now, I'm going to use the keyboard for Copy Control C. And look at that, the dancing ants are dancing around just the visible records. The hidden rows will not be copied. Now I come over to the sheet Alma records. I'm going to click in some cell and Control V. Our goal has been accomplished. We used filtering based on one condition on the sales rep column to filter and then copy and paste our records. Come back over to filter. To unfilter, we click the Filter icon and say Clear Filter. Now the dancing ants are still dancing, so we have to click Escape. Now, whenever we filter, we actually have to run a logical test. Now, for our first example, we had to ask for each record, is the sales rep equal to Alma? In this case, we got true, so we extracted the record. Then we went down and said hey, is this sales rep equal to Alma? False. Now, that's a single condition. A lot of what we do in data analysis has multiple conditions. So the first logical test we want to talk about, where we have two or more conditions or criteria is an and logical test. Our goal is to show records for the sales rep Alma and the auto Chevy. That means we have to go through every single record in this table and ask two questions of two different columns. So for the first record, I say hey, sales rep, are you Alma? True. Autos, are you Chevy? True. Because I get two trues, I extract the record. Then I got to go to the next one. Are you equal to Alma? False. Are you equal to Chevy? True. I only got one true, so that record is not extracted. Here, I get a false, false, not extracted. Down here, I get true false. That record is not extracted. So when we run an and logical test, a record is included for filtering, later for adding or counting only when we get trues for all the and logical tests. Now, in our example, we only have two questions, but we could have two or three or four or many more. All right, so filtering, it's easy. I just go to my drop down, and from my unique list, I uncheck, and say, hey, I want Alma, click OK. There's Alma. Now I've got to get rid of all the autos that are not Chevy, so I click the drop down and select Chevy Only, click OK. And there it is. Only one, two, three records result from our and logical test. Now, if we're in Excel, and we have multiple columns selected, we could actually go up to Data and look at that. There's a clear. Otherwise, I'd have to go inside each one of the drop downs and clear the filter. Now the next logical test is an or logical test. And our goal for this example is to show records for the sales rep Alma or Rina. Now, get this, because we're asking two questions of one column, I actually have to ask the two questions of the single cell. Hey, are you equal to Alma? True. Hey, are you equal to Rina? False. That record is included, because all I need for an or logical test is one or more trues. Now, when you're running and/or logical test on a single column, we're never going to get two trues, because the sales rep can't simultaneously be Alma and Rina. But I go down to the next one. Are you equal to Alma? False. Are you equal to Rina? False. That one's not included. When I get down to a record with Rina, are you equal to Alma? False. Are you equal to Rina? True. So for an or logical test, it's at least one true. Now for filtering, we uncheck everything, and we simply say, hey, I want Alma and Rina. Now, I want you to notice what I did there. That is how we say it in English. We actually use the word and here. That's what will confuse people some time. And in fact, your boss will come and say, hey, hey, I want to see all the records for Alma and Rina. So when we're using filter, we don't explicitly have to think of the or logical test, because we're just checking and picking the two items we want. But as you get further into data analysis, you have to be clear when you have multiple conditions if it's an and logical test or an or logical test. So I'm going to click OK, and just like that, I can see the column has both Alma and Rina. Those are all the records that match are or logical test. Now, we'll unfilter, and the last filter we want to look at is over here for sales. For each data type, there are specific filters, because it's a number. Look at that. There's a secret drop down. Equals, does not equal, greater than, greater than or equal to, any of these. But look at this, top 10. And by the way, these filters will exist in all of our tools, including pivot table reports. So I'm going to say top 10. Click the down arrow and we'll select five. Click OK. And just like that, I've filtered hiding all the records, where the sales are not top five. Now I'm going to unfilter this. Now, that was our first introduction to logical test. I want to go over to the sheet logical test. Now for an and logical test, we used filtering. We ask the question, hey sales rep Alma and auto Chevy. We got two trues for three different rows. So when we're filtering, we show these rows. And then we extract them and paste them somewhere else. But guess what, an and logical test is used in lots of other situations. For example, this is a pivot table report that we're going to build. That number right there is the sum of sales, but it's not all the sales. It's adding just the sales for auto Chevy and sales rep Alma. Now, internally, inside the pivot table, the pivot table will have to run that same and logical test on the entire data set. But when it does that, it will find the same three records, but instead of showing the records, it'll just take the numbers and add them. Now, we'll see lots of other and logical tests, or logical tests, and other types of logical tests throughout this data analysis class. Wow, that was an epic video. Here are the various topics we covered in an introduction to what is Excel, what's data, not confusing data and information, what's a proper data set. Remember, don't confuse tables and charts. We learned about the amazing dynamic Excel tables. We learned how to sort, how to filter and extract records, and we talked about logical tests. All right, if you like that video, be sure to click that thumbs up, leave a comment, and subscribe, because there's nine more awesome E-DAB videos. All right, we'll see you next video.
Info
Channel: ExcelIsFun
Views: 57,631
Rating: undefined out of 5
Keywords: Excelisfun, Excel Data Analysis Basics, E-DAB, Highline College, Data Analysis, Business Intelligence, BI 348, Mike Girvin, Microsoft Power Tools for Data Analysis, YouTube Education, Tables in Data Analysis, Data Types, Understanding the Difference Between Data & Information, Excel Tables For Dynamic Data, Excel Tables, Excel Table Feature, Sorting Data in Excel, Filtering Data in Excel, AND Logical Test, OR Logical Test, Logical Tests, What is Data?, ProperData Sets
Id: JzjXY7GrjNw
Channel Id: undefined
Length: 35min 27sec (2127 seconds)
Published: Thu Feb 21 2019
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.