Excel Statistical Analysis 04: PivotTable & Power Query to Build Frequency Distributions

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
welcome to excel business statistical analysis video number four and in this video we got to talk about the amazing excel tools power query and the pivot table and we're going to see three great uses for these tools first we'll see how to use the pivot table to build a frequency distribution and that will demonstrate that the pivot table tool is flat out easy then we'll see the amazing power query to import and transform a csv file and finally we'll learn how to use the pivot table grouping feature for quantitative data to build frequency distributions and if you thought you knew how to use the grouping feature you better check this video out because there are some hot tips now in this video we're going to have two different proper data sets that we need to create tabular reports for the first one is going to be a simple one column table in the excel worksheet and we'll create a frequency distribution the other table is going to have multiple columns and it's going to come to us not in the excel file but as a csv file then we'll create this tabular report where we need to group by seconds with an upper and lower limit to count occurrences we'll also create these other two frequency distributions using the pivot table now we're in chapter one excel file and we're on the sheet pivot table here's our data set we went out and took a sample of people's phone preferences and our goal is to take this data create a unique list that means one of each phone count how many times each phone occurred and then calculate the percent of total now the tool we're going to use to create this tabular report is the pivot table now this tool has been in excel since 1993 and it's the most used data analysis tool in the world why because it's so easy to use and in this statistics class most of the reports we're going to create we're going to create with a pivot table now we have a proper data set field name at the top records and rows and empty cells all the way around so we click in one single cell doesn't matter which cell in the proper data set go up to insert tables group and click the pivot table button in the pivot table from table or range dialog box it'll always get this first step correct if you have a proper data set and a single cell is selected so all we have to do is to decide where we want to put it so i'm going to say existing sheet and i'm going to click in d2 click ok now there's a blank pivot table report here and when you create a pivot table the pivot table tool took the data set and stored it in a behind the scenes location called a pivot table cache and most of the time you do not need to worry about that pivot table cache but as we'll see in our next pivot table sometimes it's important to know that this data is stored in two locations over here on the right we get the pivot table fields task pane the top part lists all of the fields we only have one field so there's only one field and the bottom part is where we create our unique lists and the values areas where we make our calculation now to create the unique list i'm going to hover over the field and i can create a unique list by clicking and dragging to the columns notice the blue at the top that means the column area the filter but the one we want is the rows area and notice that blue that means when i drop it instantly i get a unique list in the rows to count the occurrence of each phone we simply drag the same field but now we drag it down to values and because there's text data in this field when i drop it the aggregate calculation defaults to count if this were a number field and we dragged it down to values it would default to sum now those calculations are easy to change and we'll see how to change it many times now i want to change the label at the top of this report so i'm going to type phones and when i hit tab it actually changes the name over here i'll type frequency and enter that's how fast and easy it is to create a basic frequency distribution a unique list a count of each and a total at the bottom of the column that counts the total number of observations now our next task here is to calculate the percent of total so i need to take 41 divided by 236 divided by 200 and so on now last video we saw how to do this with formulas but pivot tables are easy click and drag and drop it a second time we get a duplicate count but now we want to change the calculation and in a pivot table it's easy to do whatever field you dropped into the values area you come into the pivot table and right click and there's two ways to change the calculation summarize values by that allows you to change the aggregate calculation now we have count which is what we want remember our goal is to take each individual count and divide by the column total so now we go to our second option for changing calculations show values as and there's all of these additional calculations that we can make on top of the aggregate calculation and so the one we want is percent of column total so when i click it took each individual value divided by the column total and added percent number formatting in the top cell i'll type a new label and enter now i want to show you the keyboard shortcut to create a pivot table i want to click in a single cell and the keyboard i'm going to teach you is an alt keyboard this is like the coolest efficiency trick in excel now we have a single cell selected and i want you to just tap the alt key so here i go i'm going to tap the alt key what a bunch of screen tips pop up and notice the screen tip by insert is n so now i'm not holding alt i'm going to in succession hit the n key and now what happens the screen tips open up on the insert tab i can see a v next to pivot table so i select v and then we have four locations that we can get data from from table range which is what we want and then three other locations but that t when i hit t bam there's our insert pivot table dialog box now i want to click the escape key and the reason that alt keyboards exist is so that when you're in your job and you do something all the time you can teach yourself the keyboard just to emphasize that these keyboards are for your own personal use i'm going to highlight this range right here and imagine i wanted to clear the formatting well i don't know the keyboard but i know that on the home ribbon tab from the editing group clear drop down there's clear formats so if this is something i do all the time i would teach myself the keyboard now the first couple times you do it you have to go slow because you don't know what the keyboard shortcut is alt i see an h so i tap h i see an e so i tap e and an f so i type f now the first couple times that you do it you have to look at the alt keyboard screen tips and that's why microsoft programmed it that way but after a while since this is something you do all the time you'll memorize it and it will be lightning fast now i'm going to control z on this control z i'm going to click in a single cell alt n v t and i want to put this pivot table on a new worksheet that is the default not only that but when you have a button in a dialog box that's highlighted that means the enter key will invoke that so i'm going to click escape and watch how fast it is to create a pivot table on a new sheet alt nvt enter bam just like that i come down i double click and rename it enter i come over to the pivot table field list drag down to rows drag down to values drag it a second time right click show values as percent of column total phones tab frequency tab percent frequency enter and i'm done now i might want to change the column widths here's a great trick i'm going to select the a column header click drag to c and then double click between any one of the columns and it best fits each one of those columns i'm going to click in the top cell and control and roll my wheel all right let's go back over to pivot tables all right so that's our first tabular pivot table report based on some data in the excel worksheet now i want to jump to the next sheet so i'm going to dot dot dot and for this power query pivot table example we don't have the data in the worksheet we need to import the csv file before we can make our pivot table report now at our people website or in the link below the video i'm going to right click save link as now in file explorer i have my folder selected and i want you to notice that the extension is dot csv and it has an excel icon but it's not an excel file if you double click and open this now in this csv file we get a preview of this table but guess what this is not an excel file dot csv that means if you try to make a pivot table in this csv file and save it you will lose all your work now there's three different options you have you can copy the data from the csv file and then go paste it in the excel file you can use the save as feature f12 is the save as feature and change the extension to dot xlsx or dot xlsm or in our case because this seven column table is not in the proper form for the proper data set that we need we're gonna go over to the excel file and use power query to import the csv and during the import process we'll fix the data and then load it to the excel worksheet now make sure that inside your folder you have the excel file that's where we're going to initiate the import process and you have the csv file now this csv file stands for comma separated value and i want to open it and show you what these values look like let's right click and point to open with you can open it with notepad wordpad or word but if you open it with excel then you see it as an excel file but by opening it with one of these when we click we can see in fact that each bit of data is separated by a comma now there's a special name for that comma it's called a delimiter when you're dealing with data a delimiter means what character separates the bits of data now what's so important about this is that systems communicate with each other by sending data in a comma separated value or you can see text files and instead of the delimiter being a comma it's a tab and knowing what a delimiter is is important because during the import process it'll ask you what is the delimiter we're going to close this in our excel file on the sheet power query and pivot table i want to go up to the data ribbon tab and power query is everything in the groups get and transform data and queries and connections now in this class we'll import data from the get and transform group get data drop down i click point to from file and the two options we'll have for data sources is we'll import data from a different workbook or from a text or csv file we're going to click this option you navigate to the location select the file click import or you can simply double click in the next dialog box the file name is at the top make sure that the delimiter is a comma and you want to click transform data do not click load do not click cancel click transform data when we click this it opens the power query editor and it is a separate window sitting on top of the excel window over on the right we have the query setting pane there's the name of the query and the automatic steps that have been created now before we look at those steps let's highlight the name and i want to rename this query now i'm going to name it something that indicates what type of data this is pick up truck table and enter now applied steps there's three steps these are the automatic steps that power query is taking to import and transform the data the first step source that's just connecting to that csv file and i want you to notice the first row contains the field names those really need to be up at the top as proper field names well the second automatic step when we import a csv file is promote headers and sure enough it promoted the headers now we have proper field names now for each field name to the left of the field name is something called a data type and i want you to click the abc these are all the data types that are available in power query and i want you to notice the left alignment if we imported this data with numbers as text we already saw from an earlier video that that causes big trouble but the third automatic step bam it changed the data type so for acceleration for these pickup trucks if i click the data type icon that 1.2 is the icon to represent decimal number data type that one two three that's the icon that represents the whole number data type and then the remaining ones are abc that indicates text so setting data types is part of the import process is an advantage to using power query because with data types it can assure that our calculations in the worksheet will work correctly alright so these are the three automatic steps but we have one task here now homework problem number 25 in the textbook they give you a data set similar to this and there's an element field but they give the element field as two separate fields and we want it as one field so we need to merge these two fields luckily power query makes this easy i'll select the first field hold shift click the second field and i want to transform these two fields into a single field so in the power query ribbon tabs we want to select the transform tab go over to text column and there it is the merge columns button when i click it wants to know if we want a separator i do want a separator so i'm going to click the drop down go down to custom and type a dash now we need a new column name so i'll highlight and i'll call it something like truck element and now when i click ok i get a fourth step that we created and bam there's our one single element field now we're ready to load this to the excel worksheet so we go over to the home ribbon tab close group if you click the close and load button it loads it to a new sheet and names the sheet whatever the query name is but that's not what we want we want to click the drop down and close and load two this gives you the full power to load it exactly where you want so i'm going to click in the import data dialog box the default is loaded as an excel table on a new worksheet i don't want a new worksheet i'm going to click existing and i happen to be in b2 and that's where i want it now for us we want to load it as an excel table because then it's in the worksheet and from that table we can create pivot tables and formula solutions if you're just doing a pivot table you can load it directly to the pivot table cache and either build a report or a chart and if you ever get more than a million rows of data you can use only create a connection and then check add to the data model that's a behind the scenes columnar database where you hold big data all right existing table i'm going to click ok and there's our imported and transformed table of data over here on the right the queries and connections pane opens up and that's our query if this pane is not open or you want to open it later you come over to data queries and connections and click queries and connections now the reason this is important is because you're allowed to go back and edit the import and transformation steps so for example i actually made a mistake here i did not want a dash i wanted a space a dash and a space even though this is already loaded i can go back and edit we can double click the query to open the query in the power query window here's our applied steps and for the gear icons if you click it allows you to edit so now instead of just a dash i'm going to put space dash space click ok and now we change the separator and now there's a space a dash and a space now there's lots of great uses for editing a query one that i often do is i'll forget to add the right data type so i'm building a pivot table and my numbers are text no problem i open up the query change the data type and when i load it everything's working now we've already loaded this one time so all we have to do is click the close and load and just like that we have a different separator in the first column now in this pickup truck table of data we have one element field and five variable fields in the recommended column we have categorical data at the nominal level in the owner satisfaction column we have ordinal data and then in the overall score overall miles per gallon and acceleration those are quantitative variables all ratio level now overall score is a discrete variable we can only get a whole number value but overall miles per gallon in acceleration are continuous variables that depend on how they're being measured using the pivot table tool our goal from the recommended column we simply want to create this tabular report that counts yes's and no's from the overall score we want to count how many trucks fit into categories such as 64 to 67 68 to 71 and then from the acceleration we want to count with the categories seven to eight seconds eight to nine seconds and so on these three reports are called frequency distributions that's a tabular summary that shows a unique list of non-overlapping categories with the counts the frequency for each category all right to create our first tabular report you click in any one single cell and we're going to use our keyboard alt n vt clicking existing the location i'm going to put it in i2 click ok and this is the power and ease of pivot tables recommended we drag it down to rows and instantly we get a unique list this is a text field so we drag it to values we get our count and now we can rename our column header and we're done it looks like 12 of the vehicles are recommended and nine are not recommended now let's create our next tabular report from overall score click in any one cell and then use the keyboard alt n v t now we're going to put this pivot table let's say in i8 and this is dangerous in the sense of that's a live pivot table up there if i drag some fields into the row area and this report needed to extend down below where i'm putting the second pivot table i would get a warning but in this case we're going to keep that pivot table exactly as it is so i'm going to click ok now we want to drag the overall score down to the rows area and when i drag it down and drop it instantly i get a unique list now my goal is to count and sometimes you want the unique list and then you count other times we'll see how to create those grouping categories with a lower and upper limit but let's see what happens if we drag overall score which is a number field down to values the default calculation is sum but as we know it's easy to change our calculations right click in the values area summarize values by and we're going to change it to count now this pivot table is fine it tells us looks like 83 and 84 and 76 as overall scores each had three occurrences but we want a group to group numbers or dates or times in the row area of a pivot table you click in a cell and right click group and this is a pretty amazing dialog box it shows the min the max and the buy is the increment or the step or the size of the category it's the distance between the upper and lower limit now the difference between 64 and 87 including both the 64 and 87 is 24 so i'm going to take that total range and make sure i give it an increment or a buy that is evenly divisible into 24. so i'm going to try four to start this off and this is really the art side of statistics so when i click ok that is pretty amazing it totally grouped and gave us the labels 64-67 68-71 now what happened to 72 to 75 well there are no scores that fit into that category so then the next displayed category is 76 to 79. now if you want to show that category right click go down to field settings in the field settings dialog box go to layout and print and then check show items with no data now when you do this it shows the other two categories that are hidden and the pivot table is really polite in this regard we typed in 64 and 87 but it knows that we humans make mistakes and just in case there's some data less than the lower limit we give it or greater than the upper limit we give it it includes these two extra categories now i'm going to right click field settings layout and print and uncheck click ok now i do want to change the label and it's easy to change this if you look at this we can clearly see 5 fell between 76 and 79 7 between 80 and 83 and 5 between 84 and 87 so most of them are in the upper range but if you want to see a slightly different shape to the count of the data right click group and we can change it six is also evenly divisible by 24 so let's try six and it looks like we get the same pattern right most of the counts are in the upper scores 76 and above now i'm going to ctrl z and leave it at buy 4 that means there's four possible scores in this category four in this category and so on now the important thing to learn about the excel grouping feature from this example is that when you have a column of integer or whole numbers the categories created will always have a lower limit that is exactly one more than the upper limit from the previous category so 64 to 67 that means the next category has to start at 68. here upper limit is 83 so the next category lower limit starts at 84. now the grouping feature will create different categories when the numbers have decimals so let's see what those categories are let's click in a single cell and we're going to use our keyboard alt n vt click existing and we'll be dangerous and put this in cell i17 click ok now watch what happens when i drag the acceleration column down to rows well of course i get a unique list but when we group right click group and in this case these are seconds and it looks like the range is from 7 to 11 so i'm going to create a starting point of 7 and an upper limit of the last category of 11. we pushed the lower limit of the first category lower than any value in our data set and the upper limit of the last category is larger than any data points in the column now we'll use buy or step or increment of one second now watch what happens when i click ok we get seven to eight seconds and then eight seconds to nine seconds from the display of these categories it looks like we're gonna double count the number eight the number nine and the number ten well the good news is internally inside the pivot table when you use decimal data like this the upper limit is never included so if there's a number 8 in this data set it gets counted here for example this 10 it is not counted in this category it'll be counted in this category the lower limit is included the upper limit is not now let's test this out let's drag acceleration down to values and when we drag it down here remember this is a number field but because we have already grouped these to create categories the aggregate calculation defaults to count we have a count of 8 in the category 10 all the way up to but not including 11. if i change this to a number in the 9 to 10 category let's change it to 9.9 now when i hit enter the pivot table will not update only worksheet formulas update instantly when source data changes when i hit enter we have to come over to the pivot table and refresh the pivot table cache and the way we do that is right click refresh and just like that to 10 to 11 count has changed from 8 to 7 and the 9 to 10 category has changed from 6 to 7. now i'm going to come up and change this back now it's a 10 so that 7 better increase to 8 when i right click refresh and now the 10 is being counted in the last category now we do not want to leave these ambiguous categories when someone's looking at this they don't know the internal workings of the pivot table and that the 8 is not included here but it's included here so when we group with decimal numbers we want to change the report so that it is not ambiguous now there's three ways we can do this one way to do this is to change the label at the top i'm going to hit the f2 key and in parentheses i'll type upper limit not included control enter and then add some wrap text change the column width now i'm going to leave that there and show you the other two methods which if i use the other two methods i probably wouldn't use that method there the other method is to change the dash to something like up to or to use math comparative operators to be 100 percent not ambiguous now the label we could use and this is acceptable you could say the lower limit up to the upper limit and that indicates that the upper limit is not included if you use comparative operators since 7 is the lower limit the seconds to be counted in this category have to be greater than or equal to and i'll type out seconds and it has to be simultaneously less than the upper limit which in this case is 8. now i definitely don't want to type this so the fast and easy way to do this is to highlight the labels go up to the home ribbon tab editing group find and select drop down and the replace feature now i do this all the time and you can see the keyboard there it's ctrl h so let's control h and in find what i'm looking for that dash and then replace by all means use up two that's the shortest and easiest one and this math label that i'm going to create a space less than or equal to and then type seconds and then less than space this sort of is cumbersome if you have a report that you're going to give to someone but it is the most exact i'm going to click replace all click ok click close and let's change this to frequency however you do it don't leave those labels ambiguous now the last important topic about grouping numbers in a pivot table is if you create a new report alt nvt we'll put it down here click ok and i've already grouped overall score so if i drag it down to the rows area the pivot table cache remembers that grouping now sometimes that's not what you want maybe you want the unique list of values here or you want to group it in a different way now the reason that this happens is because when you create a pivot table from a data source it stores the data in a behind the scenes pivot table cache only one time and then every time you make a new pivot table it uses that same cache of data now it does that for a good reason because if it created a new cache of data every time you made a pivot table report in a file with 30 reports you would repeat that data set 30 times and that would blow up your file size and slow things down but luckily if you do want to create a new pivot table with a different type of grouping you can do it if you add the pivot table wizard button to your quack now in an earlier video we learned how to modify this and it's easy right click customize quick access toolbar choose all commands in the list click type the letter p that jumps to the ps and then scroll down and you have to hunt for pivot table and pivot chart wizard and then you add it to your quick access toolbar then you click in a single cell use that button or you can use the old keyboard from excel 2003 and before that's like a 20 year old keyboard alt dp either the keyboard or that button will open up the wizard you select lister database pivot table next it got the data set next i'm going to put it on the existing right next to this and when we click finish now we get a new cache of data and we can drag overall score or acceleration and in each case we get a unique list and now we can group it however we would like all right in this video we learned how to use the pivot table tool to create these frequency distributions based on categorical or quantitative data and we learned how to use power query these two groups in the data ribbon tab to import and transform our csv data now next chapter chapter 2 we'll do descriptive statistics and we'll study tabular and graphical methods of summarizing data and these frequency distributions will be a large part of what we do next chapter and throughout the rest of the textbook because much of probability is based on counting occurrences of items across a unique list of categories now here's another hot tip for all the excel files that we use with the videos the yellow sheets and the red sheets if you highlight them you can print these notes out control p and i did page setup so they print out nicely so if you want your notes about frequency distributions pivot tables and power query or in earlier videos for chapter number one the list of formula elements or cursors or keyboards you can just print those out and have them to read all right that's the end of chapter one inside the folder that you built watching video number one you should have these four files this file you have to download from within canvas this has the homework templates the sheets that you use to complete the problems and it has the solutions for you to check your answer this file right here is the file that you use to import for homework problem number 25. now i want to double click and open this and look at the homework file when you get this yellow security warning at the top click enable and then just as with the video excel files the blue sheet is where you do your homework so this is the data set for homework problem number two and after you're done completing the homework problem from the textbook then you can go look at the answers so for each homework problem the blue sheet is the one you work on the red sheet is for you to check your work after you complete the homework problem all right we'll see you next video
Info
Channel: ExcelIsFun
Views: 7,675
Rating: undefined out of 5
Keywords: Statistical Analysis, Excel Statistics, Highline College, Business Statistical Analysis, Busn 210, Mike Girvin, excelisfun, Mike excelisfun Girvin, Data Analysis, Analytics, Microsoft 365 Excel, Excel Statistical Analysis, ESA 4, Frequency Distributions, PivotTable, Grouping Feature, Grouping Continuous Quantitative Data, Grouping Discrete Quantitative Data, Power Query, CSV File, CSV Excel Data, Import and Transform, Power Query to Import, Lower and Upper Limits
Id: af5lXnidEwA
Channel Id: undefined
Length: 36min 25sec (2185 seconds)
Published: Fri Nov 26 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.