Analyze Data to Answer Questions - Data Analytics Certificate Part 5

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hey there future data analysts you've made a lot of progress so far it's not an easy journey but you're doing great before you started this program something inside of you convinced you to get your google data analytics certificate you had an idea did some research and made the time to get started then you made the decision to commit to your goal now look where you are that is something to be proud of early on we jumped right into the world of data analytics and saw how data played a part in your everyday life you learned how to navigate spreadsheets and why structured thinking was key to solving problems you also explored the best way to collect and store your data and from there you gain an understanding of clean data and data integrity you've identified how to ask the right questions and learned to clean data now we'll take your skills to the next level next up you'll learn how to come up with clear and objective answers to any data question you encounter earlier we learned about the data analysis process as a quick reminder the phases of that process are ask prepare process analyze share and act we'll explore the analyze phase more here focusing on how to organize and format the data you have so that you can do all sorts of calculations knowing how to analyze the data you've collected and clean is essential to your work as an analyst before we get started i'd like to introduce myself my name is ayanna and i'm excited to be your instructor for this course i'm a global insights manager at google and i've also taught at the google analytics academy which is a training resource for google analyst in my job i help advertisers determine the value of investing in google products when you search for something online you'll often see an ad on the page that's an investment an advertiser has made i use data analysis to show advertisers the value they could gain from investing in those ads that's what i love about being a data analyst figuring out how to create value anytime i enter a situation and the best way to know if you're creating value is if you have evidence for me that evidence is data so now that you know a little bit about my love for data let's talk about what you'll learn here you'll start by covering best practices for organizing your data and the different ways you can sort through that data using spreadsheets and sql we'll also spend time learning three important ways to work with data that'll boost your analytical skills then we'll talk about saving time you'll discover tips and tricks that can help you analyze data more efficiently last but not least we'll work together to identify techniques to help you be as fair and unbiased as possible well that's all you need to know for now coming up we'll break down the basics of data analysis and bring you one step closer to our future in data welcome back it's great to see you again so let's talk about analysis we've learned how to ask the right questions prepare data for exploration and then process that data to make sure it's squeaky clean now it's time for the heart of the process the actual analysis finally right but what is analysis basically analysis is the process used to make sense of the data collected it means taking the right steps to perceive and think about your data in different ways the goal of analysis is to identify trends and relationships within the data so that you can accurately answer the question you're asking to do this you should stick to the four phases of analysis organize data format and adjust data get input from others and transform data by observing relationships between data points and making calculations okay let's apply the four phases of analysis to a real world scenario imagine you want to buy a gift for your friend zara's wedding the problem is you're not sure what to get her fortunately you have a ton of data from her wedding website but instead of reading all the data on her website and scrolling through a photo album of her and her partner you go straight to the online registry a wish list of gifts they'd enjoy the registry is like a data set that you can analyze to make a decision now that you're checking out organized data and the registry you want to make sure that the list of data or gifs in this case is formatted in a way that's easy to reference formatting data streamlines things and saves you time scrolling through hundreds of gifs can be time consuming so instead you can adjust the data in a way that makes it easy to digest by filtering and sorting your data you have a budget you want to stick to so you sort the gif prices from low to high you then filter prices to include gifts that are within your budget of 60 so you're working with a newly formatted list of data at this point it's good to remember that input from other people can also be really helpful when analyzing information and making decisions you can check the list of gifts to figure out if anyone else has already bought any of the items you realize a few of the items in the list have been purchased and this informs your decision when analyzing data gaining input from others is important because it gives you a viewpoint you might not understand or have access to on top of gaining input from other people it's also important to seek out others perspectives early that way if they predict any obstacles or challenges you'll know beforehand the people you look to for input don't have to be experts to be helpful sometimes all you need is for someone who's familiar with the topic or data you're considering in our example that would be zara's wedding guests who are purchasing gifts from the same online registry they probably aren't wedding gift experts but their collaborative effort to mark off the item they purchase can help you figure out what not to buy which will prevent zara from getting the same gift twice so in the end getting input is valuable to your analysis this brings up the last step of the analysis transforming data transforming data means identifying relationships and patterns between the data and making calculations based on the data you have going back to our example you were able to find a gif that you knew zara would like and one that fits your budget you were also able to choose a gift that wasn't already purchased by someone else by finding the relationship between these data points you chose purchase and sent a gif that would answer the problem you wanted to solve the beauty of the analysis process is that you probably already analyze situations in your everyday life whether you're analyzing data in your personal life or in your career these four tasks can help you make better decisions and the more you do it the more comfortable you'll feel with the process i hope this gives you a better understanding of the basics of analysis as we move forward we'll check out how to locate data for analysis both in a spreadsheet and using sql when you're ready you can go ahead see you soon i think one of the coolest things about working with data at google is that we have one of the world's most valuable data sets people refer to google data as really a lens into human curiosity we often look at google as really a proxy for what's happening in the world and so for many of our advertisers they really really value the data and the insights that we're able to give them from google because they believe it's a proxy or a reflection of what's happening in their business or within their industry and so i think the the value of the data that we're able to work with at google really keeps me interested and excited about the work that i do so i came to google about three years ago after spending a few years in consulting and so i was really interested in switching into a role that was really focused on sales and marketing but at the same time i still wanted to be able to leverage the analytical skill set that i had gained prior this role was a great complement to the skill sets that i already had and the interest that i had in moving into the sales and marketing function i think one important thing for all students to realize is that no one learns this material overnight many of your colleagues you may look at as experts but most likely they've been able to gain that level of expertise through their years within the field i think one of the biggest attributes that students should keep in mind is that the most important thing that they need to have throughout this learning journey is grit grit to understand that it may be a struggle it may be a challenge but if you put in the work and you put in the time these concepts will eventually click and you'll be well on your way to becoming a data analyst hi my name is ayanna and i'm a global insights manager here at google hi again let's jump back in right now we're in the analyze phase of the data analysis process and even though each phase is unique data analysts make decisions about organization throughout all of them that's what we're talking about here organization it's super important that you keep your data organized throughout your analysis how your data is classified and structured will impact your findings whether you're working in a spreadsheet or a database and once you know how your data is organized you'll be able to capture or collect the information you need most of the data you'll use in your analysis will be organized in tables tables help you organize similar kinds of data into categories and subject areas that you can focus on as you analyze for example this basic database has tables for car dealerships product details and repair parts each table then has several fields of data like branch owner and the cost of repair parts you can use these tables and fields to help you decide how to move forward with your analysis the structure of this database can help you decide which data you need to pull to meet your objectives for example the total number of a particular brand of car sold or repair part for specific make and model of a car at a certain branch tables allow you to make decisions about data types they help you to figure out what variables you need and the data type those variables should have so you have a database where you need to convert a data type during your analysis you can do that by using the cast command in sql or any other method that you learn on the job or from your own research like this example where we converted a purchase price column to be a float instead of a string so that it was in a numerical form we could use for calculations if you're performing your analysis in a spreadsheet you want to make sure that the columns and rows are effectively organized you can even hide columns that you won't need for analysis or that show duplicate information once you have the data organized and formatted you'll be ready to sort and filter it to find the data you need we'll cover sorting and filtering soon but for now just know that both filters and sorts are affected by the type of data we're working with the bottom line is that it's important to have your data in the right format so always be prepared to adjust no matter how far into your analysis you are that's all for now coming up we'll show you what filters are all about bye hey great to see you again earlier we talked about why you should organize your data no matter what part of the life cycle it's in just like any collection it's easier to manage and care for a group of things when there's structure around them now we should keep in mind that organization isn't just about making things look orderly it's also about making it easier to search and locate the data you need in a quick and easy way as a data analyst you'll find yourself rearranging and sifting through databases pretty often two of the most common ways of doing this are with sorting and filtering we briefly discussed sorting and filtering before and it's important you know exactly what each one does sorting is when you arrange data into a meaningful order to make it easier to understand analyze and visualize sorting ranks your data based on a specific metric that you can choose you can sort data in spreadsheets and databases that use sql we'll get to all the cool functions you can use in both a little later on a common way to sort items when you're shopping on a website is from lowest to highest price but you can also sort by alphabetical order like books in a library or you can sort from newest to oldest like the order of text messages in a phone or nearest to furthest away like when you're searching for restaurants online another way to organize information is with a filter filtering is showing only the data that meets a specific criteria while hiding the rest typically you can use filters when you want to narrow down the amount of data you want to sift through say you're searching for green sneakers online to save time you filter for green shoes only using a filter slims down larger data sets to smaller subsets that are relevant to what you need sorting and filtering are two actions you probably perform a lot online whether you're sorting movie show times from earliest to latest or filtering your search results to just images you're probably already familiar with how helpful they can be for making sense of data now let's take that knowledge and apply it when it comes to sifting through large disorganized piles of data filters are your friend you might remember from a previous video that you can use filters in spreadsheet programs like excel and sheets to only display data from rows that match the range or conditions you set you can also filter data into sql using the where clause the where clause works similarly to filtering in a spreadsheet because it returns rows based on a condition you name let's learn how you can use aware clause in a database we'll use bigquery to access the database and run our query if you're joining us open up your tool of choice for using sql and reference the earlier resource on how to access the data set otherwise watch as the where clause does its thing here's the database you might recognize it from past videos basically it's a long list of movies each row includes an entry for the columns named movie title release date genre director cast members budget and total revenue it also includes a link to the film's wikipedia page if you scroll down the list the list goes on for a long time of course we won't need to go through everything to find the data we want that's the beauty of a filter in this case we'll use the where clause to filter the database and narrow down the list to movies in the comedy genre to start we'll use the select command followed by an asterisk in sql an asterisk selects all of the data on a new line we'll type from and the name of the database movie underscore data dot movies to filter the movies by comedy we're going to type where then list the condition which is genre genre is a column in the data set and we only want to select rows where the cell in the genre column exactly matches comedy next we'll type the equal sign and write the specific genre we're filtering for which is comedy since the data in the genre column is a string format we have to use single or double quotations when writing it and keep in mind that capitalization matters here so we have to make sure that the letter casing matches the column name exactly and now we can click run to check out the results what we're left with is a shorter list of comedy movies pretty cool right here's something else that you should know you can apply multiple filters to a database you can even sort and filter data at the same time for even more precise results as a data analyst knowing how to sort and filter data will make you a superstar that's all for now coming up we'll get down to the nitty-gritty of sorting functions and spreadsheets see you there hey there data pro happy to see you back and ready to explore more of the organizational side of data analysis in this video we'll learn how to sort data in spreadsheets we've done some sorting in spreadsheets earlier in the program now it's time to build on what we've covered and introduce some more advanced sorting techniques sorting is amazing not only does it add order and meeting to your spreadsheets it also gives you the power to reimagine data all together when you sort data based on a specific metric you can uncover new patterns and relationships within data sets you might not have otherwise noticed this is especially true for spreadsheets which you'll use a lot in your work as a data analyst knowing how to sort data in spreadsheets can make you a stronger and more confident analyst in many ways sorting relies on your creativity to reimagine the information you have in front of you in spreadsheets you can sort data by ascending or descending order using numbers or letters if cells are labeled with color you can sort them by color too when sorting data in a spreadsheet you can choose to sort sheet or sort range if sort sheet is applied all of the data in a spreadsheet is sorted by the conditions of a single column but the related information across each row stays together on the other hand sort range doesn't keep the information across rows together when you sort a range you're selecting a specific collection of cells or the range that you want the sorting limited to nothing else on the spreadsheet gets rearranged but the specified cells there's two methods for sorting spreadsheet data one involves using the menu the other involves writing out the sort function for now we'll focus on sorting with the menu we'll get to writing out functions later on now depending on the program you use the process might seem slightly different but the instructions and concepts we discuss will be basically the same okay back to sorting with the data menu to give you an idea of how to do it we'll use the movie spreadsheet let's check it out in this example we'll sort movies by release date so we'll head to column b which is listed as release date and click on the b button to highlight all the cells in the column from there we'll head to the data tab in the menu now you have two choices sort a sheet or a range of data you'll notice that we've selected just the release dates but these release dates are specifically related to the movies in their row in this case you want the release date and the movie title to stay in the same row as you sort because they're related to do this you'll want to sort sheet this will keep all the data together by row no matter how you sort it depending on the order you want the release dates to be in you can sort from a to z which also ranks the dates numerically or you can sort from z to a which will sort data the opposite way since we want the release dates to be in order we'll click sort sheet by column b from a to z and there you go you just sorted a sheet of data using the menu now the movies are arranged in chronological order based on release date okay let's say you want to sort data in a specific column but don't need the cells in that column tied to a specific row of information instead you want to isolate the column's data and sort it on its own without affecting how the rest of the sheets arranged for fun we'll use the movie title column in this example first we'll select the column we want to sort column a clicking on column a highlights all the cells in the column which contain the movie titles then we'll go to the menu and click data because we're isolating the column from the rest of the sheet when we sort this time we'll click sort range by column a for this example we'll sort the movie titles alphabetically from a to z and that's it you'll notice that sort range doesn't keep the rows together so the data is a bit jumbled you'll probably end up using sort sheet more often but it's important to understand them both so you don't accidentally get them mixed up you've just sorted data in a spreadsheet using the menu and you've learned how to sort data by entire sheet or by a range of cells that's something you'll be able to take with you wherever you go as a data analyst coming up we'll learn about the second way to sort in a spreadsheet by writing out a function we'll also take sorting to the next level by custom sorting your data see you there happy to have you back earlier in the program we covered some basics of sorting in spreadsheets we learned the differences between sorting a range and an entire sheet and how to sort a spreadsheet using the menu now that we've laid the groundwork it's time to move on to more advanced ways to sort information we've talked about how there's two methods of sorting data in spreadsheets the first method uses the data tab in the menu of your spreadsheet program the second way to sort information in a spreadsheet is by writing a sort function in spreadsheets functions are preset commands that perform a specific process so in this case the sort function as you might be able to guess sort your data let's check out the spreadsheet of party plans to witness the sort function in action it's good to keep in mind that when you use the sort function you're actually changing the existing data set unlike when you use the data tab in the menu which rearrange the data in the original data set the first arranged set of data is our original data set of guests and some information about them so let's say you want to sort the party guess by table to get an idea of who will be sitting where to do that start by typing a function and an empty cell just like any function you do this by typing the equal sign and then write sort after it after your first open parenthesis reference the first cell in which data is collected from in this case that's a2 then you'll include a colon and write the last cell you want included in the function which is d6 a2 colon d6 is the range for this function next write a comma to separate the range from what we're sorting by which is column b you should keep in mind that this part of the function doesn't recognize column letters so in this case we use the corresponding number instead which is 2 since column b is the second column in our range now add another comma and this next part you'll need to decide whether you want the data in this column to be in ascending or descending order a true statement is in ascending order and false is descending because we want the tables to be listed starting from table number one we'll write true for ascending order and then end the function with a closed parenthesis now let's see our function play out our party guests are now sorted by which table they're seated once you have an idea of the data you want to be sorted and how applying functions to your data is simple now you have two different tools in your toolbelt for sorting data after you've tackled writing sort functions you'll want to customize sort orders too a customized sort order is when you sort data in a spreadsheet using multiple conditions this means that sorting will be based on the order of the conditions you select let's go back to our party spreadsheet imagine you want the guests to be sorted by whether or not they've been sent an invitation and based on that we want those guest names to be listed alphabetically you can do that easily with the sort range option under data first highlight all the data in the set from cells a1 to d6 then under the data tab in the menu click sort range in this case check data has a header row which makes sure that the title of the column isn't mixed into the sorting then we'll make sure it's being sorted by sent invitation here we want the no responses first and the yes response is second so we'll make sure a to z is clicked to sort the responses in that order because we want to add an additional sorting condition we'll now click on add another sort column the guest names should be in alphabetical order so let's select guest names and sort from a to z then we'll click sort and voila you've officially applied a custom sort order like a champ okay so you've tackled sorting in spreadsheets by sheet by range through the menu and by using a function on top of all that you've added to your organizational skills by learning how to create custom sort orders pretty soon you'll learn another powerful tool how to sort data using sql even though databases can sometimes be a lot to digest learning these skills gives you the power to rearrange data in a way that makes sense to you once you've sorted data in a way that really clicks you'll understand why it's so valuable to you as a data analyst bye for now [Music] i'm emma and i'm a product analyst on google health the product that i am helping analyze data for is a tool for clinicians and what this is going to allow clinicians to do is find their patients health data just as easily as you can find data on google search what i specifically focus on is standardizing health care data analyzing it to find anomalies or data quality issues and talking to product managers about which features we should launch and why i've worked on problems from solving locomotive failures and predicting them before they happen to forecasting sales for handbags before the handbag is even launched to now working with healthcare data to try to allow clinicians to easily have their patients data at their fingertips i was really interested in working in data analytics but i was trying to figure out what type of data i wanted to work with or what field within data analytics i wanted to focus on because there are so many and i was ultimately drawn towards working with healthcare data i just really fell in love with all of the problems that we have today in healthcare and how just this vast amount of data that's available in the healthcare industry could be better utilized to help patients to help clinicians to improve population health working with meaningful data sets is what really makes me excited to come to work every day and excited to solve these problems what i found in my career is that following the data sets that interested me the types of problems that interested me always yielded better results because i was just that much more driven to go to work every day to do my best to solve these interesting problems because it's just what interested me the amazing thing is there's data everywhere there are data problems at every company in every field and you really get to just follow what you're passionate about hello there if you're hoping to learn about sorting and sql this time you've definitely come to the right place so far we've sorted spreadsheets through the menu and with a written function which brings us to the next part of our learning more sort functions but this time in sql data analysts love playing with the way data is presented sorting is a useful way to rearrange data because it can help you understand the data you have in a different light as you've probably already noticed a lot of things you can do in spreadsheets can also be done in sql sorting is one of those things we've talked about using sql with large data sets before when a spreadsheet has too much data you can get error messages or it can cause your program to crash that's definitely something we want to avoid sql shortens processes that would otherwise take a very long time or be impossible to complete in a spreadsheet personally i use sql to pull and combine different data tables it's much quicker than a spreadsheet and that usually comes in handy here's something pretty helpful you can do with sql you can use the order by clause to sort results returned in a query let's go back to our movie spreadsheet to get a better idea of how this works feel free to follow along in a sql tool of your choice as we go as a quick refresher we have a database of movies listed with data like release date director and more we can sort this table in lots of different ways using the order by function for this example let's sort by release date first we have the select function and an asterisk keep in mind that the asterisk means all columns are selected then we have from and the name of the database and table we're in right now now let's check out the next line it's empty but that's what we'll write our order by function the order by command is usually the last clause in your query so back to the actual sorting we'll type order by with the space with this clause you can choose to order data by fields in a certain column because we want to sort by release date we'll type release date by default the order by clause sorts data and ascending order if you run the query as it is right now the movies will be sorted from oldest to the most recent release dates let's run the query and see what we've got you can also sort the release dates in the reverse order from the most recent dates to the oldest to do this just specify descending order in the order by command written as des d e s c run this query as you'll notice the most recently released films are now at the top of the database in spreadsheets you can combine sorts and filters to display information differently you can do something similar in sql 2. you might remember that while sorting puts data in a specific order filters narrow down data so you only see data that fits the filter so for example let's say we want to filter movies by genre so that we're only working with comedies but we still want release dates to be sorted in descending order from most recent to oldest films we can do this with the where clause let's try that now first we'll check that the order by clause is always the last line that makes sure that all the results of the query you're running are sorted by that clause then we'll add a new line for the where clause after from and before order by here's what we've got so far from there we want to type the column we're filtering for in this case we want to filter the database for comedies so after the where clause we'll type the column list name as genre now we'll add an equal sign after genre because we only want to include genres that match what we're filtering for in this case we're filtering for comedy so we'll type comedy between two apostrophes now if you check out the entire query as a whole you'll notice that we're selecting all columns and we know it's all columns because that's what an asterisk means the from clause specifies the name of the movie database we're using and the where clause filters the data to include entries whose genre is specified as comedy and then in the last line we have the order by clause which will sort the data we've chosen to filter by release dates in descending order this means when we run the query we'll only have comedy movies listed from newest releases to oldest releases so let's run it and figure out if that's the case cool check out all those comedy movies and the way those dates are sorted now let's take this query a step further we'll filter for two conditions at once using the and filter working off the query we've been using we'll add a second condition in the where clause we'll keep the sorting the same let's say you wanted to filter by comedy movies and movies that earned over 300 million in the box office in this case after the and function you add the revenue condition by typing revenue from there you'll specify that you only want to return films with revenues over 300 million dollars to do that type the greater than sign and then the complete number of 300 million without commas now let's run the query here the data only shows comedy movies with revenues of over 300 million dollars and it's sorted in descending order by release date it looks really good you just filtered and sorted a database like it's your job and with practice one day it can be and just like that you finish another step in your data analyst journey by now you really dug and learned about the analysis process with a special emphasis on how organization can change how you go through your data you also learned about both spreadsheets and sql and how to sort and filter data in both types of programs to help you get more comfortable using spreadsheet and sql features you'll be getting some materials you can use as a resource coming up we'll check out how an organizational mindset can take your analytical skills even further we'll also cover converting formatting and adjusting data to combine information in a way that makes sense learning those skills early on can make your work as a data analyst much more efficient and effective in the long run see you soon hey it's great to have you back you've learned so much already and now you're ready to start analyzing data coming up we'll cover some final things you'll need to do for your analysis to make sure your data is formatted and adjusted correctly we'll start converting and formatting your data using data validation and spreadsheets and conditional formatting you'll also learn how to combine multiple pieces of data and finally we'll talk about how to get support during your analysis and find resources whenever you're stuck these skills will help make sure that your data analysis process is as smooth as possible even when it isn't you'll know how to tackle any problems that might come up a big piece of being an analyst is troubleshooting and problem solving you're as good of an analyst as your ability to ask the right questions which is why we'll spend some time learning about problem solving strategies you can use during analysis so whenever you're ready to start learning about data formatting and solving problems head to the next video and we'll get started welcome back while we're learning about formatting data i want to talk to you about another spreadsheet feature data validation in this video i'll teach you a little bit about data validation and show you how to use it for now when i say data validation i'm talking about the function which is different from the data validation process we'll get into that later on but first let's talk about what data validation does in spreadsheets basically it allows you to control what can and can't be entered in your worksheet usually data validation is used to add drop-down lists to cells with pre-determined options for users to choose from if you have a spreadsheet with a lot of collaborators this can make it easier for them to interact with your table you can think of it like a multiple choice question on a quiz and since you can control what's being entered into the worksheet it cuts down on how much data cleaning you have to do later on let's figure out how we might do that for this example we'll work on a project with a lot of milestones and deadlines to keep track of let's say our team has a spreadsheet that tracks everyone's progress but instead of making everyone write in where they are on their task individually we can provide a drop down menu with multiple options like not yet started in progress and ready so we'll select the column that we want to add the drop-down menus to in this case the status column then we'll go to the data pull-down menu here at the top and click data validation this brings up a pop-up menu with options for data validation in this case we know that we want to add a list of items for other users to choose from so we'll select the list of items option and the possible criteria and type in the selections we want to create then hit save and now all of those cells have drop down menus that we can use to easily mark progress for each task but there's other things that you can do with data validation and spreadsheets too like creating custom check boxes to do this let's select the cells under the review column to make a checkbox that will let us know if tasks have been approved or not we'll go back to the data validation menu but instead of choosing list from a range we'll choose check box there's an option to use custom cell values let's choose that and put in approved and not approved now these tasks can be checked off by whoever's reviewing them like a project manager for example another way we can use data validation is to protect structured data and formulas the more people that are working together in a spreadsheet the more likely someone can accidentally break a formula but good news the data validation menu has an option to reject invalid inputs which helps make sure our custom tools will continue to run correctly even if someone puts the wrong data in by mistake all right now you know three uses for data validation in your spreadsheets adding drop-down lists creating custom checkboxes and protecting structured data and formulas data validation can help your team track progress protect your tables from breaking when working in big teams and help you customize tables to your needs coming up we'll learn more about conditional formatting and some ways you can use conditional formatting and data validation together see you soon hi again so earlier we talked about conditional formatting as a spreadsheet tool that changes how cells appear when values meet specific conditions this lets you add visual cues to your spreadsheets that make it easier to understand your table at a glance and it makes the information in the worksheet clearer to your stakeholders in this video we'll take that even further by combining conditional formatting and data validation to create custom tools for our spreadsheets so far we've used conditional formatting to highlight empty cells that still needed data so that we could quickly pinpoint what information our table was missing and added in now let's build on that by using it to make our scheduling table easier to read at a glance here's a table we worked with when we covered data validation it's tracking the status of different tasks on our project for our team to check on but now there's even more tasks than the last time we looked at it this table has useful information but it takes a second to understand right now we don't have a visual on how many tasks are in progress or how many upcoming deadlines there are but if we color coded those elements of the table we could quickly see key pieces of data really easily let's start with the status column column c in the last example we created these drop down menus with the data validation tool now we can use conditional formatting to add some color let's go to the conditional formatting option under the format menu this brings up a sidebar where we can select our range rule and formatting style we need to decide which rows to apply our formatting to when the condition we set is met we can click this button in the range options to select all of the rows we're applying the formatting to instead of typing it in now that we have those cells selected we can choose the rule that we want to apply to these cells we already have drop down menus with specific text so we can choose format cells if text is exactly from the rules for our first rule let's write not yet started as the text condition then we'll choose a color to apply to those cells that have not yet started in them let's use red now all cells that have not yet started selected from the drop down menu will be red let's hit the add another rule button to add conditional formatting to our other status options let's add the condition in progress next we can make that one yellow and then we'll add one last rule for ready let's choose green and there now we have an easy to understand visual cue that tells us how many tasks are in progress and how many are completed we can also combine data validation and conditional formatting to track upcoming deadlines we have a column of dates called review by this date first let's use the data validation functionality to make sure users only enter valid dates we'll go back to the data drop down at the top pull up data validation and select date as our criteria then we can go to the format menu at the top go down to conditional formatting and open the sidebar again we'll click the select range icon and select the review by this date column now under format rules we can select data's after which will give us another option let's choose today and finally let's choose the color for these cells so if the date listed in these rows is after today it'll be filled in orange you can also choose a specific locked date if needed but for now let's go with today now all of the upcoming review dates have an easy to see color code so anyone using this table can quickly reference these deadlines you'll find that some spreadsheet programs like excel have built-in color codes that you can use too and there you go now you know how to use data validation and conditional formatting to create custom tools and visual cues that make your information easy to understand there's a lot of different ways to use these tools so feel free to experiment with them in your own spreadsheets coming up we'll keep learning about new tools for spreadsheets and sql bye for now great to see you back in this video we'll build on what we learned about concatenate and import range by exploring a new sql query concat you might remember that concatenate is a function that joins together two or more text strings and as a quick reminder a text string is a group of characters within a cell most often composed of letters you've seen how this works within a single spreadsheet but there's a similar function in sql that allows you to join multiple text strings from multiple sources concat let's use concat to combine strings from multiple tables to create new strings for this example we'll use open data from citibike which is a public bike sharing system in new york as you learned earlier open data initiatives have created a ton of data for analysts to use openness or open data is free access usage and sharing of data it's a great resource if you want to practice or experiment with the data analysis tools you've been learning here we have open access to the new york city bike sharing data which has information about the use of the shared bikes across the city so now we can use concat to pull and concatenate data from the different columns stored here the first thing we need to do is figure out which columns we need that way we can tell sql where the strings we want are for example the bike sharing company has two different kinds of customers one time paying customers and subscribers let's say we want to find out what routes are most popular with different user types to do that we need to create strings of recognizable route names that we can count and sort we know that the information we need is in the station and trips table so we'll start building our query accordingly first we'll input select user type to let sql know that we want it as a column then we'll use concat to combine the names of the beginning and ending stations for each trip in a new column we'll type in as route to make a route column this will make these route names easy for us to read and understand after that we'll want sql to count the number of trips so we'll input count and an asterisk to tell it to count up the number of rows in the data we're selecting where each row represents a trip in a new column now let's also create a column for trip duration which we can just round up notice how our sql functions first cast the values in the trip duration column as integers then calculates the average value for the trip duration and then rounds that value to two decimal places next we'll tell sql where this information is stored so we'll use from and the location we're pulling from since we're using count and average functions in our select clause we have to use group by to group together summary rows let's group by the start station the end station and the user type for this query finally we'll use order by to tell it how we want to organize the data for this we want to figure out the most common trips so we can input the number of trips column and use des to put it in descending order finally we only want the top 10 so let's add limit 10 and now thanks to concat we can easily read the route names and trace them back to real places we can see which kinds of customers are taking which routes which can help the bike sharing company understand their user base in different parts of the city and where to keep more bikes for people to rent being able to combine multiple pieces of data can give you new ways to organize and analyze data and there's a lot of different tools to help you do that now you've seen concat in action and later you'll come across another similar query join but up next we'll talk more about working with strings see you soon hey welcome back so far we've worked with strings in both sql and spreadsheets before and we've learned that they usually have similar functions in this video we'll take another look at lin left right and fine you've come across these functions used in sql but now you'll find out how they work in spreadsheets going back to our bike sharing data set let's check out one of their spreadsheets this is one of the trip data spreadsheets in the start time and stop time columns there are strings that contain information about date and time of each ride this is all useful data but chances are we'll only need part of these strings to make a formula or answer a question for example these strings contain multiple data points like date and time but if we're trying to find the average time between start times we won't need the date we can actually use lin left and right and find to split the timestamps into separate columns if we want let's build a simple formula to separate the dates in these strings we know that lin tells us the length of a string let's check how long these date time strings are now to start we'll input the first part of the formula and then we'll just select one of the cells with the date time string in it these strings are 19 characters long we can use the find function to locate specific characters in a string keep in mind this is case sensitive so if you're using find to pull a sub substring make sure that you've input the substring correctly we notice that all of the date time strings have a space separating the date and the timestamp so we can actually use fine to figure out where the date ends okay seems like the space is the 11th character in this string so the timestamp substring will start at character 12. we can use the left and right functions to select which parts of the string we want to isolate in a new column we'll use right on one of these cells to indicate that we want to grab the right side and like we've come across before left actually works exactly the same way now we can apply that to the rest of column c to pull those timestamps as a data analyst being able to work with strings is a key skill especially when you find yourself working with data from outside sources hopefully you're a little bit more comfortable applying lin right left and find functions in both sql and spreadsheets later on we'll use these functions to perform even more complicated formulas so feel free to try them out on some data yourself maybe even some open data like we've been using today see you later hi there so data analysts spend a lot of time problem solving and that means there's going to be times when you get stuck but the trick is knowing what to do when that happens in this video we'll talk about the importance of knowing how to get help whether that means asking someone else for help or searching the internet for answers asking other people about a problem you're having can help you find new solutions that move a project forward so it's always a good idea to reach out to your peers and mentors especially if they're working with you on that project your team members have valuable knowledge and insight that can help you find the solution you need to get unstuck sometimes we spend a lot of time spinning our wheels saying i can do this myself but we can be way more productive if we engage with other people find new resources to lean on and try to get as many voices as we can involved for example let's say you're working with the bike trip time data from the previous videos maybe you're trying to find the average time between bike rides in a given month calculating the difference between bike rides before midnight is easy but you can run into a problem if the elapsed time crosses into the next day if someone went on a bike ride at 11 pm but the next ride wasn't until 6 am your formula will return a negative number because the end time is less than the start time you know that you can add one minus the start time if two bike rides start and end on different days but that formula won't work on times that happen in the same day and it's pretty inefficient to scroll through every bike ride to pinpoint these special cases you need to find a way to build a conditional formula but you aren't sure how so you decide to check in with other analysts working on your team to see if they have any ideas you could send them a quick email or stop by their desk to find out if they have a minute to talk it over with you turns out they had a similar problem on a previous project and they're able to show you a conditional formula that you could use to speed up your calculations great they suggest using an if formula like this this basically says that if the end time is larger than the start time replace the standard end time minus start time formula with one minus start time plus end time now it's also possible that your team members don't have an answer that's okay too there's definitely someone else with the same problem asking the same questions online knowing how to find solutions online is an incredibly valuable problem-solving tool for data analysis there's also all kinds of forums where spreadsheet users can ask questions and you never know what you can turn up with just a basic search for example let's say you look at calculate number of hours between times spreadsheets and find a helpful walkthrough for a more complicated formula using mod this flips the negative values into positive ones solving your calculation problem whether you're asking someone you know or searching the internet for answers reaching out for help can give you some really interesting solutions and new ways to solve problems for future analysis coming up we'll learn even more about searching for solutions online see you soon hi my name is layla and i'm an analytical lead at google so an analytical lead is someone who helps advertisers understand the value of their advertising dollars we also help them understand if they were to spend another dollar on ads where should they spend that dollar and what could they expect to get from it the skill set involved in this kind of role has all to do with being able to look at a data set and make sense of it and then tell a story to people who maybe don't have that same experience level with data what's going on in the data what's driving growth for your client or your company what could they do more of or less of to drive more of what they want to happen the analyze stage is like preparing a fabulous meal you have done all the cleaning and the preparing and the cooking and you're finally able to take a bite and to see if what you were originally hoping to happen or what you were expecting to see if that is really the case is it delicious is it exactly like you expected or is the consistency a little off and you need to add a little bit more salt the analysis stage begins once you've prepped and cleaned your data you don't want to have those blank fields that will throw you off or duplicate entries that will enlarge your data set beyond what's actually true the analyze stage is where you become the expert about your data set here you're going to understand all of the different fields you're going to understand their averages potentially the median of the data you're going to understand how different rows and your data differ from each other and it's where you're going to gain the confidence to be able to explain your findings to an audience that maybe does not have the same level of expertise with data that you have when i analyze data i often like to use sql and spreadsheets you can use these tools to for example sort your data and understand which entries are larger than others or to understand how many times something happens by selecting maybe the distinct entries here you can also filter out data that you're specifically interested in analyzing or in a spreadsheet use conditional formatting to show which entries show a more positive outcome and which ones are maybe more negative using sql and spreadsheets to help you through the analyze phase are absolutely crucial with these tools you can format your data set in a way that is digestible and then begin to tell a story with the data my favorite part of working in a spreadsheet is when you finally have that data that you want cleaned up and exported from your sql query then you get to turn it into a pivot table and chart out exactly the cut of data that you were interested in looking at in the first place and explore the trends that are happening there when you get to do that you basically unlock a whole world of information and you get to pick what story that you actually want to tell with your data without just saying okay this is the largest number that's the smallest number here you get to show what is happening over time potentially or what you should expect to see in the future welcome back earlier we talked a little bit about finding resources online to help you figure out solutions to problems during analysis the internet has so much knowledge and advice to offer but you need to know how to find it so in this video we'll talk more about finding answers online you might think that great data analysts don't regularly rely on outside resources but that's a myth the best data analysts know that finding answers to their problems online can be empowering and give them new knowledge for the future being able to find new ideas and combine them with what you already know can help you come up with some amazing things so don't be afraid to turn to the internet to find your answers it's a great resource that lots of analysts use me included but let's talk more about how you can make sure you're using web resources in the best way possible there's a combination of best practices that you can use to guide your search for answers online by practicing the thinking skills we've learned in this program using the right data analytics terms and your basic knowledge of analysis tools you have everything you need to find answers and apply them to your own work and it starts with how you approach a problem mentally you've learned about different kinds of thinking skills and how to practice them and your data analysis work from analytical to mathematical to structured thinking this helps build your mental model or your thought process and the way you approach a problem data analysts use these thinking skills to approach a problem logically and break it into smaller parts building this into your own problem solving process can help you pinpoint specific questions which you can use to find resources more easily for example maybe you keep running into an error in your analysis you narrow it down to two possibilities your formula or the data itself you double check your formula and you see that it's correct so now you know that you need to make sure that data has been entered correctly you consider the problem logically and tracked it down to the source using your mental model next it's important to use the right terms when searching for solutions knowing how to frame data analytics questions with the same language other analysts are using will help you get more search results and it'll help you understand what other analysts are saying for example maybe you need to use the left four characters of a string for a column in sql how would you search for this searching for four characters in a column is a little vague and might not bring up specific resources but left string query sql uses some keywords that other data analysts are also using to talk about these things on top of being able to use the right terms to search online you also need to be familiar with basic tools that way when an online resource is walking you through a new function in a tool you've used before you'll know how those tools work for example if you find a spreadsheet formula online you need to understand how formulas work to apply it to your own spreadsheet or maybe the data set you're working with is too large for a single spreadsheet and you'll need to switch to sql having a variety of tools in your tool kit is important as a data analyst but just as important as knowing when to use them if you find yourself stuck on a problem it can be a good idea to take a step back and reconsider how you're approaching a task we've covered a lot of tools that you can use as a data analyst throughout this program and later you'll learn one more r we'll talk about r more later on but here's a sneak peek to get you excited for it r is another programming language but it's not a database language like sql it's a programming language frequently used for statistical analysis visualization and other data analysis r is a little different from other tools we've been working with but it's a great complement for the tools you're already using and it will give you more potential solutions when you run into problems using the thinking skills we've learned throughout this program the right terms and your understanding of different analysis tools will get you ready for the next steps of this process actually searching for answers online there's a lot of resources like program support websites and forums where other data analysts are asking and answering questions in an earlier video we ran into a problem trying to calculate the time elapsed between bike rides and the bike sharing data maybe our first search calculate time in spreadsheets didn't turn up the answers we needed by thinking about our specific question and how other data analysts might be asking it we could change that search into conditional formula for calculating elapsed time in spreadsheets and now we have more specific solutions to our problem finally being able to modify example code to fit your own needs is so useful understanding the syntax of formulas and functions for different tools will allow you to take what you learned online and make it work for you and maybe even build on it to create a whole new solution for example the mod formula we built to account for trips that started and ended on different days and our bike sharing data the mod formula we found online wasn't created for the data we were working with but because we are familiar with spreadsheet tools we were able to apply it to our data and use it as a solution to our problem great data analysts know how to find and use resources online to help them build new solutions to problems they face by using the thinking skills you've already learned in this program and using your knowledge of data analytics tools and terms you can too and once you've found some answers to your problems you can build them into your analysis work to get past any challenge you might face hello there in these videos you've been introduced to spreadsheets sql and so many other tools we've also talked about choosing the right tool before you start a project but sometimes you find yourself stuck on a problem during your data analysis that might mean it's time to reconsider which tool you're using for the job for example if you're working with a simple spreadsheet maybe five to ten rows in a few columns then pivot tables are a great way to visualize that data but if that spreadsheet is more than a million rows it'll start to crash making a pivot table hard to complete when you find yourself working with a huge spreadsheet that keeps crashing you might switch to sql to pull the data you need from different locations in a database instead of from a single spreadsheet you might remember that sql can handle trillions of rows of data and is now a standard language for working with database programs sql is great for querying updating and optimizing data but trying to analyze your data with only sql can get complicated as you continue to progress as a data analyst you might find yourself spending a lot of time building long nested queries and then debugging them it might be time to consider another tool r r is a new tool that you'll work with later on but for now i'll tell you a little bit about it so that you can start getting excited r is another programming language but it's not a database language like sql it's a programming language frequently used for statistical analysis visualization and other data analysis r is a little different from other tools we've been working with but it's a great complement for the tools you're already using with r you'll be able to analyze and visualize data in all kinds of new ways we'll talk about r more later on but i hope this sneak peek gives you an exciting first look having a variety of tools in your toolkit is important as a data analyst but just as important as knowing when to use them if you find yourself stuck on a problem it can be a good idea to take a step back and reconsider how you're approaching a task do you have too much data for a single spreadsheet switch to sql are you spending more time debugging queries than actually analyzing data maybe you should consider r you also know how to find answers online now so if you ever run into a problem and need to try a different tool a quick search can be really helpful there might be resources online or someone else may have had the same problem and posted about it this is great if you start feeling stuck on a problem and you might even find a new way to use a tool you're already familiar with and that brings us to the end of this module great job we've covered a lot of information we learned about converting and formatting data how to combine multiple pieces of data and how to search for help when you need support during your analysis coming up next you'll take on the weekly challenge as always feel free to go back over anything we've learned from these past videos then i'll see you for the next video good luck welcome back in the next few videos we'll explore something called data aggregation aggregation means collecting or gathering many separate pieces into a hole for example the milky way galaxy is an aggregation of stars dust and gases so data aggregation is the process of gathering data from multiple sources in order to combine it into a single summarized collection in data analytics a summarized collection or summary describes identifying the data you need and gathering it all together in one place for example let's say you have a cabinet full of different puzzles one day a shelf breaks and all the boxes topple over scattering the puzzle pieces everywhere to get each puzzle organized again you need to identify the pieces that correspond to each particular puzzle gather them together and put them back into their correct boxes only then can you work with these pieces and create a complete picture so in data the puzzle pieces represent the data that lives in different separate data sets getting them organized is the aggregation process then the piles of pieces that complete a single puzzle become your summary and finally putting those pieces back together is like analyzing them to gain important insights data aggregation helps data analysts identify trends make comparisons and gain insights that wouldn't be possible if each of the data elements were analyzed on its own for instance data on high school graduations for individual students can be aggregated into a single graduation rate for an entire class data can also be aggregated over a given time period to provide statistics such as averages minimums maximums and sums for example that same yearly graduation rate data can be aggregated once again into a summary that shows us graduation rates for districts states and countries here's another example let's say you had data on real estate sales in a particular neighborhood for each of the past 10 years if you aggregated all of that data you'd be able to discover the average price of a home in that area and how values have increased or decreased over time functions are a big help in making data aggregation possible you'll learn how to use some of the most common ones to create your summary soon in addition we'll talk about aggregating data using something called a sub query you've seen sql in action and you understand that a query is a request for information from a database so a sub query also called an inner or nested query is a query within another query after the next several videos you'll know how to aggregate data and understand the tools you'll be using along the way let's get started hi again in this video we'll prep our data for vlookup a data aggregation tool as you learned before data aggregation is the process of gathering data from multiple sources in order to combine it into a single summarized collection data aggregation can give you all kinds of information about the data you are looking at for example in marketing you can aggregate data from an ad campaign to see how it performed over time and for particular customers travel companies use data aggregation to figure out how much their competitors charge for a certain flight hotel room or rental car type then they can make sure they price their own products as competitively as possible one thing these businesses all have in common is that they can use vlookup to help them achieve these goals as a reminder vlookup stands for vertical lookup basically it's a function that searches for a certain value in a column to return a corresponding piece of information earlier we used vlookup to take the value in one cell and search for a match and another we were able to match a product code made up of numbers and letters that lived in one spreadsheet to the actual name of the product that lived in another but before any of that can happen we need to make sure our data's properly prepared as you've heard many times clean data is much more likely to give you accurate results so let's start with the first common data cleaning task different data types for example a data set might have dates formatted as numbers or numbers represented as text strings instead of numeric values when data is not in a consistent format or a format that the spreadsheet application recognizes vlookup won't know what to do with that data and it'll return an error earlier you learned how to convert numbers to dates using the format tool so now let's focus on converting text to numeric values to do this you could use the format menu to select a type of number but you could also use the value function value is a function that converts a text string that represents a number to a numerical value here's an example in this spreadsheet the numbers in column a are currently text strings we can confirm this by running a simple sum function the syntax is equal to sum open parentheses and then the items you want to add together here it's a2 to a4 the colon says we're including everything between these two references now you can add a closed parenthesis and press enter or you can click and drag on the cells you want inside the parentheses to save a little bit of time the result is zero that's because the function doesn't work on text strings but if we apply the value function it automatically converts that text to a numeric value to do that we'll type equals value than an open parenthesis inside we reference the cell whose value we want to convert in this case a2 so now if we close the parentheses and press enter you'll notice that the one two three is numeric if we drag it down the column the 456 and 789 also become numeric now we can test it by running another sum function we'll type equals sum and an open parenthesis then b2 colon and b4 so b2 b3 and b4 are included in the sum close the parentheses and press enter now it shows that the total is 1368. okay the next common error comes from having extra spaces in your spreadsheet as you've learned when data is copied from one source to another sometimes a few leading or trailing spaces tag along these can cause problems when using vlookup so we want to make sure to use trim during the data cleaning process trim automatically deletes any extra spaces added to the cell another typical mistake in vlookup which you can easily catch during data cleaning are duplicates if there are duplicate rows in the search it will return only the first match it finds as you learned before remove duplicates is a tool that automatically searches for and eliminates duplicate entries from a spreadsheet using remove duplicates as you saw in a video a little while ago is a great way to get rid of duplicates and help make sure you find the right record during the lookup it's always good to remember that clean data is the foundation that everything else is built on and vlookup can be a very useful data cleaning tool in the next video we'll keep exploring more ways you can use vlookup see you there when people start out in data analytics they often think that those of us who've been in the field for a while know everything but trust me we're all still figuring things out and a lot of the time that means troubleshooting troubleshooting has to do with asking the right questions and that's what we'll focus on in this video we'll learn how you can use troubleshooting to solve all kinds of problems to do this we'll need to talk about some of the limitations of vlookup and then practice fixing some of the most common problems that data analysts face some of the troubleshooting questions i like to ask myself how should i prioritize these issues trying to solve lots of problems all at once can feel overwhelming i find it helps when you take things one at a time next i ask in a single sentence what's the issue i'm facing this helps to clarify what's really going on so i don't get bogged down with extra details after all if you don't have a clear objective before looking at the data you can find just about anything it's always best to start with your own clear understanding of the situation then let the data tell you if you're on the right track or not the next question i ask myself is what resources can help me solve the problem the internet is one of the best resources out there if you have a question chances are thousands of others run into exactly the same thing so a quick search can be really helpful and it's good to remember that people are resources too don't be afraid to ask questions not only is it a great way to learn it can also help you build strong relationships with your colleagues and a final important question i think about how can i stop this problem from happening in the future if a new procedure or guideline can stop the same issue from popping up again that's a great time saver alright let's start by noting that vlookup only returns the first match it finds even if there are lots of possible matches to understand this imagine your data set as a phone book according to research some of the most popular names are james smith and maria garcia this means if you need information about the 100th maria garcia listed vlookup won't work it'll only give you results for the very first maria it finds something else to keep in mind is that vlookup can only return a value from the data to the right it can't look left good news there's a simple solution data analysts usually get around the problem by copying and pasting a column to the left of the data they want to look at this way the lookup value is in the leftmost column and the data they want is to the right of it here's another problem i see a lot let's say the first few rows of a vlookup have returned the correct result but when you drag the function down the column problems start popping up this is probably because the table array part of the function hasn't been locked or made absolute an absolute reference is a reference that is locked so that rows and columns won't change when copied you can fix this issue by wrapping the table array and dollar signs as you learned a while back the dollar sign controls how the reference will be updated they make sure that the corresponding part of the reference doesn't change something else that can throw off your vlookup results are version control issues in other words a function worked perfectly at first but then something in the spreadsheet it was referencing changed for example maybe a user inserted a column so now the columns in your function no longer direct vlookup to the right place when something like this happens it'll return an incorrect value there are a few actions data analysts can take to ensure this doesn't happen first lock the spreadsheet this stops other people from making changes to do this in sheets select data then protected sheets and ranges in other spreadsheet applications there are other tools that do the same thing next choose what you want to protect in this case we want to protect the entire sheet then you can set permissions to either show a warning or restrict who can edit choose only you then done but keep in mind there will be times when other people need to work in the spreadsheet so locking them out might make you pretty unpopular with your co-workers when that's the case you can use match which is a function used to locate the position of a specific lookup value and can help you with version control we won't get into that right now but just know that it's an option in case you ever need it the final problem we'll talk about has to do with exact and approximate matching when using vlookup you're likely to get different results depending on whether you enter the word true or false within your function true tells vlookup to look for approximate matches and false tells vlookup to look for exact matches so if a function looks like this it's telling vlookup to find the closest match to the text or number we're looking for it's important to note that vlookup starts at the top of a specified range and searches downward vertically in each cell to find the right value it stops searching when it finds any value that's greater than or equal to the lookup value that's why data analysts typically use false like this that way vlookup only returns the exact match to what you've entered in the lookup value vlookup is one of the most popular lookup and reference functions in spreadsheets it's also one of the trickiest coming up you'll learn about more of these common challenges everything you learn will help you run into fewer problems when you start using vlookup as a future data analyst hey welcome back so far we've checked out a few different tools you can use to aggregate data within spreadsheets in this video we'll cover how to use join and sql to aggregate data and databases first i'll tell you a little bit about what a join actually is and then we'll explore some of the most common joins in action let's get started join is a sql clause that's used to combine rows from two or more tables based on a related column basically you can think of join as a sql version of vlookup which we just covered there's four common joins data analysts use inner left right and outer here's a handy visualization of what each join actually does we'll use these to help us understand these functions joins help you combine matching or related columns from different tables when we learned about relational databases we refer to these values as primary and foreign keys primary keys reference columns in which each value is unique to that table but that table can have multiple foreign keys which are primary keys in other tables for example in a table about employees the employee id is the primary key and the office id is a foreign key joins use these keys to identify relationships and corresponding values an inner join is a function that returns records with matching values in both tables if we think about our tables as the circles of this venn diagram then an inner join would return the records that exist where the tables are overlapping for the records to appear in the results table there has to be key values in both tables the records will only merge if there's matches in both tables when you input join into sql it usually defaults to inner join so a lot of analysts will use join as shorthand instead of typing the whole query a left join is a function that will return all the records from the left table and only the matching records from the white table here's how you can figure out which table is left or right in english and sql we read from left to right so the table mentioned first is left and the table mentioned second is right you can also think of left as the table name to the left of the join statement and write as a table name to the right of the join statement in this diagram you'll notice that the entire left table is colored in and the overlap with the right table which shows us that the left table and the records it shares with the right table are being selected each row in the left table appears in the results even if there's matches in the right table right join does the opposite it will return all records from the right table and only the matching records from the left you can get the same results if you flip the order of the tables and use a left join for example select from table a left join table b is the same as select from table b right join table a finally there's outer join outer join combines right and left join to return all matching records in both tables this means it'll return all records in both tables if there's records in one table without a match it'll create a record with null values for the other table using joins can make working with multiple data sources a lot easier and it can make relationships between tables more clear but unfortunately they don't work in sql let's find out why here's an example let's say you're working with employee data across multiple departments we have an employees table and a department's table which both have some columns like department id we can use different join clauses to help us pull different data from our tables and aggregate it maybe we want to get a list of employees with their department name excluding any employee without a department id because the department id record is used in both tables we can use an inner join to return a list with only those employees as a quick reminder analysts will sometimes just input join for an inner join but for this example we'll write it out so to build this query we'll start with select and as to tell sql how we want the columns titled then we'll use from to tell it where we're getting this data in this case the employees table and then we'll input interjoin in the other table we're using which is departments we can specify which column in each table will contain the matching join key by writing on employees dot department underscore id equals departments dot departments underscore id now let's run it and there now we've got a list of employee names and department ids for the employees that have those ids but we could use left or right join to return a list of all employee names and their departments when available let's try both really quickly so this will start similar to the last query we'll put in select as and from again but this time we'll say left join and use on like we did with the last query when we execute the query we get back this new list with the employee names and departments but you'll notice there's null values these are places where the right table which is departments in this case didn't have corresponding values try right join just to test it out this query will be almost the same only difference is that we'll use the right join clause to return all of the rows from the right table whether they have matching values in the table to the left of the join statement or not in this case the right table is departments now let's try out one last join outer auto join will fetch all of the employee names and departments again this query will start a lot like the other ones we've done we'll use select as and from to choose what data we want and how we'll grab this from the employees table and put full outer join with the departments table to get all of the records from both we'll also use on again here now we can run this and we'll get all of the employee names and departments from these tables there will be nulls in the department.name column and the employee.name column enroll columns because we've joined columns that don't have matching values and there now you know how joins work joins are super useful when you need to work with data from multiple related tables they give you a lot of flexibility with how you combine and view that data and if you ever have trouble remembering what inner right left or outer join do just think back to our venn diagram we'll keep learning about aggregating data in sql next time see you soon hi it's great to have you back by now we've discovered that spreadsheets and sql have a lot of tools in common earlier in this program we learned about count in spreadsheets now it's time to look at similar tools in sql count and count distinct in this video we'll talk about when you'd use these queries and check out an example let's get started count can be used to count the total number of numerical values within a specific range in spreadsheets count and sql does the same thing count is a query that returns the number of rows in a specified range but count distinct is a little different countdistinct is a query that only returns the distinct values in that range basically this means count distinct doesn't count repeating values as a data analyst you'll use count and count distinct anytime you want to answer questions about how many like how many customers did this or how many transactions were there this month or how many dates are in this data set and you'll use them throughout the data analysis process at different stages for example you might need them while you're cleaning data to check how many rows are left in your data set or you might use count and count the stink during the actual analysis to answer a how many question you'll run into these kinds of questions a lot so count and count distinct are really useful to know but let's check out an example to see count and count distinct in action for this example we're working with a company that manufactures socks we have two tables warehouse and orders let's take a quick look at these tables before we start querying first we'll check out the warehouse table you can see the columns here warehouse id warehouse alias the maximum capacity the total number of employees and the state the warehouse is located in we'll pull up the top 100 rows of the orders table next we can use limit here to limit the number of rows per turn this is useful if you're working with large data sets especially if you just want to explore a small sample of that data set from this query we're actually going to start with a from statement so that we can alias our tables aliasing is when you temporarily name a table or column in your query to make it easier to read and write because these names are temporary they only last for the given query we can use our from statement to write in what our tables aliases are going to be to save us some time and other parts of the query so we'll start with from and use aliasing to name the warehouse orders table just orders let's say we need both the warehouse details and the order details because we want to report on the distribution of orders by state so we're going to join these two tables together since we want data from both of them an alias our warehouse table in the process in this case we're using join as shorthand for interjoin because we want corresponding data from both tables and now that we have the aliases in place let's build out the select statement that comes before from let's run that and there now we have data from both tables joined together and we know how to create these handy aliases now we want to count how many states are in our order data to do that we'll use count and count distinct now we can try a simple count query first we'll join the orders and warehouse tables in our from statement and in this case we'll start with select and count the number of states let's run this query and see what we get wait that's not quite right this query returned over 9 000 states because we counted every single row that included a state but we actually want to count the distinct states let's try this again with count distinct this query is going to look similar to the last one but we'll use distinct to cut out the repeated instances we got the last time we'll use the query we just built but replace count with count distinct in our select statement let's try this query that's more like it according to these results we have three distinct states in our orders data let's check out what happens when we group by the state column in the warehouse table which we'll call warehouse.state we'll use join and group by in our from statement so let's start there again and then group by warehouse state now let's build out our select statement on top of that we're still going to use count distinct and let's run it now we have three rows one of each state represented in the orders data and our count distinct on the number of orders sums up the account we ran earlier 9999 you'll find yourself using count and count distinct during every stage of the data analysis process so understanding what these queries are and how they are different is key great job and i'll see you again soon hey it's great to have you back in this video i'll introduce you to another kind of sql query sub queries a sub query is a sql query that is nested inside a larger query have you ever seen one of those nesting doll toys they're also known as matri oshkou russian nesting dolls sub queries are a lot like nesting dolls no really your larger query could have a sub query in it and then that sub query could have a sub query and then that subquery could have another subquery but when you stack them all together they make one query with subqueries you can combine different pieces of logic together because the logic of your outer query relies on the inner query you can get more done with a single query this means all of the logic is in one place which makes it more efficient and easier to read the statement containing the sub query can also be called the outer query or the outer select this makes the subquery the inner query or interselect the inner query executes first so that the results can be passed on to the outer query to use subqueries can get a little confusing because there are so many layers but if you keep in mind that the innermost query executes first it'll be easier to order your subqueries when you want them to execute subqueries can also be nested inside all sorts of other queries usually you'll find subqueries nested in from or where clauses let's try out some common subqueries we'll start with the subquery and a select statement using the bike sharing data from an earlier example for the first statement let's say we want to compare the number of bikes available at a station to the average number of bikes available we're going to use this query to pull the average number of bikes available then we're going to incorporate it as a sub query now let's build our outer select query we want to select the station id and the number of bikes available then we'll put the select query that's pulling the average number of bikes inside that outer query by using parentheses we'll also build from into the subquery before closing it with another parentheses and completing the outer query the end of the outer join query has asked to show what we want to call this column and a final from statement to indicate which table we're referring to now let's run it and there we've got a table with both the number of bikes available and the average number of bikes available at different stations it's really common to see sub queries nested and from and where statements so let's try those next we could use a from statement to calculate the number of rides that have started at each station over time we'll start with our outer query and input select station id name and number of rides we'll use as to tell it how we want the table labeled and from to tell it where we're pulling data from but before we finish that query we'll add a subquery we'll put our parentheses here and then select the start station id then we can tell it to count the number of rides from the trip data and group it by the start station id after that we'll close the subquery with the parentheses so that we can continue building the outer query we'll use as again and then use inner join and on to join it with the station id data and finally we'll tell it to put it in descending order let's see what happens when we run that we now have the number of rides started at each station okay one last example let's use a where statement the bike sharing company has two kinds of users subscribers and one-time customers let's say we wanted a list of stations subscribers used as always we start with the outer query select the station id and name from the public data set we're using and this time we'll use a where statement we'll also use n so that we can specify multiple values in this where statement then we'll put our subquery in the parentheses we'll add select from and where again but this time we'll tell it that we only want data on specific customers it's good to note that you can use comparison operators in subqueries even multiple row operators like in any or all in this case we'll use equals to indicate that we only want the subscriber user data now let's run the query and we've got the station id and names for stations that fit our criteria so that's sub queries in action subqueries can be challenging there's a lot of layers to think through and you might find yourself running into errors when you practice that's totally okay having to go through that challenge means you're growing if everything was easy we wouldn't find new ways to grow for me it's all about how much work and how much time i need to put in to do it so give yourself time to practice this new concept coming up you'll get a chance to use subqueries to aggregate data or you can move on to the weekly challenge you'll take everything you've learned like using vlookup different joins and subqueries and apply it to this upcoming assessment we've been doing a lot of complex work so if you want to take a moment to review these videos before moving on feel free and once you finish the challenge i'll see you again for our next big learning adventure see you soon [Music] hi i'm justin i work here at google in the google cloud space i lead a small team of data analysts who answer business problems for our executive team the first thing i would tell you about my journey to analytics was it was not direct i came to google three years ago and have been doing data analytics and really been enjoying that role and it's kind of tied together this through line of excitement about data and answering questions that have an impact your career path's not always straightforward maybe data analytics won't be my final destination but what i would say is just keep like changing little by little figure out what's exciting about your role right now in my case it was i loved you know avoiding politics and coalition building and really just bringing better facts and better insights to really motivate decisions so kind of figure out what you like about your current role your current job and then figure out what's what different role could you take that would build on that but maybe get you more of what you like be curious uh like the number one skill is really just asking why and then going and trying to answer that question and you'll like lead you down you know along whether it's wikipedia to understand this model that someone using saying why did they use that model you go and look up that model and sort of follow that thread or you know there's so many great resources for different languages if you want to understand sql there's so many great tools but i guess the number one thing is just sort of follow your curiosity so when i'm reviewing resumes the first thing obviously i'm looking for is those core skills the ability to analyze data demonstrated experience with some of the tooling we use but i'm also looking for a real passion in answering questions so examples where someone's really dug in and tried to understand the why and they just kept asking why is this happening why is this happening and really dig in hey there so we've got some experience nesting sub queries into our sql statements to perform more complicated queries now it's time to talk about how to aggregate data with sub queries coming up we'll learn about some new subquery statements and use them to aggregate data the query we're going to build in this video is pretty advanced it's going to be a little complicated but i know you've got what it takes let's get started we've used functions like where to filter our data before but the where function can't be used with aggregate functions for example you can use where on a statement and follow it with group by but when you want to use group by first and then use where on that output you'll need a different function this is where having comes in having basically allows you to add a filter to your query instead of the underlying table when you're working with aggregate functions that way it only returns records that meet your specific conditions similarly case returns records with your conditions by allowing you to include if slash then statements in your query so let's try to aggregate our data with subqueries and test out these new functions let's say we're working with a company that makes socks that we talked about earlier we've been asked to calculate what percentage of the orders are fulfilled by each warehouse basically we're interested in knowing which warehouses are delivering the most orders we've seen these tables before but as a quick refresher here's the orders table you can see the columns here order id customer id warehouse id order date and ship date and if we pull up the warehouse table we can check out its columns we have the warehouse id warehouse alias the maximum capacity the total number of employees and the state where the warehouse is located before we start building the rest of our query we'll want to alias our table names as a reminder aliasing is when you temporarily name a table or column in your query to make it easier to read and write this example query is a little bit more complicated than the ones we've seen before so aliasing will help save us some time we'll start by aliasing the warehouse table in our from statement the from statement in this query is near the end but we'll build this first so that we can use the alias everywhere else we'll simplify it to just warehouse for the rest of this query we know that we're going to join these tables together so let's add that while we're working on this part of the query anyway we're using a left join here because we want all the information from our warehouse data even if it doesn't show up in the orders table then we'll alias the orders table as part of this statement now both of our tables have temporary names we can use and we've already finished a join statement but before we can build the beginning of this query let's go ahead and add our group by statement after this join we'll group these by the warehouse id and name and now we'll go back to the beginning of the query we'll select the warehouse id then we'll use concat to combine the strings with the warehouse's state and alias as the warehouse name then we'll use count to get the number of orders per warehouse next we'll build in a subquery to pull the total number of orders placed across all warehouses we'll input select again and then write the subquery in parentheses we'll put an asterisk after count to indicate that we want to include everything from the orders table and finally we'll close out the subquery and use as to name this column total orders now that our subquery logic is complete we can use a case statement to create categories for our warehouses based on how many orders they will fulfill we'll represent these as percentages you should notice count in the statement a few times we'll start by saying when the number of orders from our order table is less than or equal to 0.2 then the table will say fulfilled 0 to 20 percent of orders and then we'll use when again to indicate that when the number of orders is greater than 0.2 and less than or equal to 0.6 it'll say fulfilled 21 to 60 of orders after that we can use else to have everything that doesn't meet the criteria of our case statement say fulfilled more than 60 of orders then we'll use end as to name this column fulfillment summary that brings us back to the portion of the query we've already written but we're going to add a having statement at the very end of this query our warehouse table has warehouses that are currently being built and we want to filter those out since they aren't fulfilling orders yet we can use having to only include warehouses that have at least one order now before we execute this query let's take a moment to look at the whole thing we have an outer select account sub query a case statement a join and having all wrapped into one query we've built a really complex query so let's run it to see the new table and there now we can easily identify what percent of our company's total orders are being fulfilled by each warehouse these warehouses met our criteria and we can see here in the fulfillment summary column the percentage categories we outlined in our case statement obviously since we included a having statement to specify only warehouses with at least one order there aren't any warehouses currently under construction in this table that really complicated query we wrote created the specific table of data we can use to easily compare how these warehouses are performing there you go that's a quick taste of what it's like to work with subqueries and data aggregation clauses like having encased paired with subqueries will help you build more and more complex queries which lets you do more and more complex things in sql hi good to have you back coming up we're going to be reviewing some familiar concepts and then using those concepts to explore new ones as a data analyst you'll use key tools and processes over and over but you'll also learn new things as you grow in your job it could be anything from building a new kind of analysis to a time saving shortcut when i first got to google i relied on just a couple of programs and tools to access data and do my analysis but i soon realized that i wasn't working as efficiently as i wanted to once i got comfortable pulling data and analyzing it using sql it allowed me to be a lot more efficient than before and the better i got at sql in pulling the data from data tables the faster i completed my analysis i was hooked over the next few videos i'll show you some ways to be as efficient as possible while completing calculations during your analysis we'll start by revisiting spreadsheets where we'll look at formulas for basic calculations then we'll move into conditional formulas that use the if function to check whether a condition is met through a calculation after that we'll explore the multifunctional sum product function try saying that five times quickly some product adds and multiplies all in one step so it's very useful next we'll take another look at pivot tables if you've skipped around and it's your first time learning about them you'll get to know all about them pivot tables have tons of uses including organizing your calculations we'll then pivot to sql pun intended we'll show how queries and calculations go hand in hand in sql we'll also look at temporary tables in sql which are helpful for temporarily storing your data during analysis we'll be covering lots of new concepts in these videos so feel free to hit the pause button at any time to think through the problem or steps to try it on your own and you can always review the videos as much as you need to so to recap we'll have a little bit of a review and then cover some all new concepts all about calculations are you ready good me too hey there you probably do a lot of calculations in your daily life maybe it's figuring out how to tip someone or balancing your budget you might do some of these calculations in your head or with paper and pencil or the calculator on your phone you might even have shortcuts you use to make the calculations easier you'll perform a lot of these calculations as a data analyst too but they'll involve more numbers and a wider range of calculations that's where you will put your data analyst tools to work we'll show you how you can use formulas in a spreadsheet to complete some of the more basic calculations formulas are one of the many shortcuts that data analysts use but rest assured even though they're shortcuts they still calculate with complete accuracy we've covered a lot of these calculations earlier in the program but if you skip that part and won a refresher we'll review them here these calculations will also be more advanced than the ones we've covered so far but they will also be closer to what you might use on the job we'll be using google sheets and this video but you can also use excel the steps might look a little different in excel but the outcomes will be the same okay let's try out some calculations with sales data from a discount store chain we'll look at data from one of the stores in the chain our objective use the existing sales data to find any trends this is a great way to see a lot of the ways formulas can be useful in your analysis we'll start by finding annual sales over the years 2011 to 2020. the data is already organized in columns by month and in rows by year but we don't have the total sales for each year yet we can use a sum function to help us figure that out we'll add the sales for 2011 first we'll add a heading for the annual sales column then we can type our sum function in a formula all formulas begin with an equal sign so we'll type that first followed by sum and then an open parentheses after the open parenthesis we need to tell the formula which cells are being added in this case we need data from the whole row which begins in cell b2 b2 is the cell reference we'll use instead of typing each cell one by one we can put them in the formula quickly by selecting cell b2 and dragging the fill handle across the row to the last cell with sales data m2 the fill handle is a tiny box in the corner of each cell you can use it for lots of things like selecting multiple cells for a formula or continuing a pattern across several cells the fill handle definitely qualifies as a shortcut all right now we'll complete the formula by closing the parentheses and pressing enter and just like that we've calculated the total sales for 2011. here's another shortcut we worked on in an earlier video we can use the formula we created to calculate the total sales for the other years in the data set all we have to do is drag the fill handle down the other cells in the annual sales column and we'll have total sales for the rest of the years in the data set let's say we also need to find the growth in annual sales from year to year this would be a good time to think through the problem before we try to solve it do we have the data we need to solve this not yet thinking backwards like this helps us plan out the steps to move forward the first step we'll need to do is calculate the total sales by year then we'll measure the rate of change between years we'll start by labeling a new column in this case we won't need to use a function or parentheses since we're only using data from two cells we can just use the name of those cells we'll type an equal sign and then click in cell n3 which automatically populates that cell in the formula next we'll add a minus sign to the formula because we're subtracting to find the difference between two consecutive years clicking in cell into gives us the total from 2011 which we can then subtract from the total from 2012. then we hit enter and get our sales growth from 2011 to 2012. we're definitely getting some useful data here so let's keep going we can also use our sales growth to find the growth rate between the two years we'll show this as a percentage so we'll head our column with the percent sign and growth to do this we'll divide the total in cell o2 by the annual sales from 2011 and sell into a slash is a symbol that a formula recognizes as division so we'll place that between the two cell references in presto there's the growth rate growth rates are usually shown as percentages which can be easier than a decimal to read and understand so let's change this number to a percentage time for another shortcut all we have to do is click the percent style button and our growth rate will become a percentage we can select the cells for both the total growth and the growth rate to populate the rest of the two columns we have some negative numbers but that just means that there was negative growth from one year to the next okay we've got just a few more things to calculate for our stakeholders next up is finding the average sales we want to compare sales between months to learn if there's a trend so we'll add this in a row instead of a column this will line up our averages under each month to find our averages we'll calculate the total and then divide that total by the number of values added to get it we can do this by using the average function between our parentheses we'll select the cells that contain the sales data for january b2 through b11 we'll duplicate that formula across the row through december to look for trends right away we know that summer months and december have the highest average sales since our stakeholders will want to understand our findings quickly and easily will add a little visualization to the data with conditional formatting conditional formatting is a spreadsheet tool that changes how cells appear when values meet specific conditions so let's apply conditional formatting to the cells with the average sales by month we'll use a color scale to show the range of averages with the lowest monthly average remaining as white and we'll apply shades of green to the rest of the values the brighter the green the higher the average now when we share our analysis with our stakeholders they'll be able to tell right away which months have the highest average sales okay just a couple more steps to complete our analysis now we need to find the minimum and maximum for average monthly sales with the data set this small it might be easy to find the minimum and maximum values without a formula but it's still good practice to use one not to mention using a formula helps prevent human error we'll again rely on formulas with functions to do these calculations we'll start with the lowest monthly average our function here is men followed by the cells with the average month b12 through m12 after we press enter the lowest monthly average is calculated we can repeat the same steps to find the highest monthly average in this formula we'll use the same data but we'll replace min with max for maximum so for this store location sales are strongest in december and weakest in january we could share these findings with stakeholders if they've met our objectives if they haven't we might need to continue with our analysis either way i hope you've learned how spreadsheet formulas can be valuable tools when doing calculations coming up we'll check out more formulas see you soon welcome back one of the first calculations most kids learn how to do is counting soon after they learn adding and that doesn't go away no matter what age we are we're always counting or adding something whether it's change at the grocery store or measurements in a recipe data analysts do a lot of counting and adding too and with the amount of data you'll come across as a data analyst you'll be grateful to have functions that can do the counting and adding for you so let's learn how these functions countif and sum if can help you do calculations for your analysis more easily and accurately we'll start with the countif function you might remember countif from some of the earlier videos about data cleaning countif returns the number of cells that match a specified value earlier we showed how countif can be used to find and count errors in a data set here we'll only be counting just a reminder though while we won't be actively searching for errors in this video you'll still want to watch out for any data that doesn't look right when doing your own analysis as a data analyst you'll look for and fix errors every step of the way for this example we'll look at a sample of data from an online kitchen supplies retailer our stakeholders have asked us to answer a few questions about the data to understand more about customer transactions including the revenue they're bringing in we've added the questions we need to answer to the spreadsheet we'll set up a simple summary table which is a table used to summarize statistical information about data we'll use the questions to create the attributes for our table columns count revenue total and average revenue per transaction each of our questions asks about transactions with one item or transactions with more than one item so those will be the observations for our rows we'll make quantity the heading for our observations we'll also add borders to make the summary table nice and clear the first question asks how many transactions include exactly one item to answer this we'll add a formula using the countif function in cell g11 we'll begin with an equal sign countif and an open parenthesis column b has data about quantity so we'll select cells b3 through b50 followed by a comma next we need to tell the formula the value that we're looking for in the cells we've selected we want to tell the data to count the number of transactions if they equal one in this case between quotation marks we'll type an equal sign and the number one because that's the exact value we need to count when we add a close parenthesis and press enter we get the total count for transactions with only one item which is twenty-five we can follow the same steps to count values greater than one but this time because we only want values greater than 1 we'll type a greater than sign in our formula instead of an equal sign getting this information helps us compare the data about quantity okay now we need to find out how much total revenue each transaction type brought in since the data isn't organized by quantity we'll use the sum if function to help us add the revenue for transactions with one item and with one more item separately sum if is a function that adds numeric data based on one condition building a formula with some if is a bit different than one with countif they both start the same way with an equal sign and the function but a sum if formula contains the range of cells to be evaluated by your criteria and the criteria in other words some if has a list of cells to check based on the criteria you set in the formula then the range where we want to add the numbers is placed in the formula if that range is different from the range being evaluated there's commas between each of these parts adding a space after each comma is optional so let's try this in cell h11 we'll type our formula the range to be evaluated is in column b so we'll select those cells the condition we want the data to meet is for the values in the column to be equal to one so we'll type a comma and then inside quotes an equal sign in the number one then we'll select the range to be added based on whether the data from our first range is equal to one this range is in column c which lists the revenue for each transaction so every amount of revenue earned from a transaction with only one item will be added together and there's our total since this is revenue we'll change the format of the number to currency so it shows up as dollars and cents so the transactions with exactly one item earned one thousand five hundred and fifty five dollars in revenue let's see how much the transactions with more than one item earned okay let's check out the results just like with our countif examples the second sum if formula will be the same as the first except for the condition which will make greater than one when we run the formula we discover that the revenue total is much higher four thousand seven hundred and thirty five dollars this makes sense since the revenue is coming from transactions with more than one item good news to complete our objective we'll do two more quick calculations first we'll find the average revenue per transaction by dividing each total by its count this will show our stakeholders how much of a difference there is in revenue per transaction between one item and multiple item transactions this information could be useful for lots of reasons for example figuring out whether to add a discount on purchases with more than one item to encourage customers to buy more we'll put these calculations in the last column of our summary table you might remember that we use a slash and a formula as the operator for division calculations the average revenue for transactions with one item is 62 dollars and 20 cents and the average revenue for transactions with more than one item is 205 dollars and that's it for our analysis our summary table now gives stakeholders and team members a snapshot of the analysis that's easy to understand our countif and some if functions played a big role here using these functions to complete calculations especially in large data sets can help speed up your analysis they can also make counting and adding a little more interesting nothing wrong with that and coming up we'll explore more functions to make your calculations run smoothly bye for now hi again data analysts love discovering new ways to work on their analysis especially when those new ways simplify their work i know i'm a big fan of learning new tricks to complete tricky tasks instead of trying to find a new way to do something every time i do an analysis i try to learn from other people by asking questions and getting help when i need it the people i work with like to use the phrase stealing with pride all this means is that you should feel no shame for using a process in your analysis that you learn from someone or somewhere else fellow team members message board posts online searches i've used all of these resources for ideas with pride of course i always cite my sources when i do that's a super important step to remember the sum product function is also one of those tricks that analysts come across either on their own or from another source you can also think of it as a shortcut for doing more complex calculations we'll show you how some product works and when you might use it to make your work life simpler some product is a function that multiplies arrays and returns the sum of those products and here's what the sum product formula looks like equal sign the sum product followed by an open parenthesis and arrays being multiplied and then added together each array is separated by a comma an array is kind of like a range in a spreadsheet but keep in mind an array is a collection of values in cells not the cells themselves so when added to a formula the sum product function multiplies each of the values in two or more arrays together for example each value in the array of cells b3 through b7 can be multiplied by its corresponding value in the array of cells c3 through c7 so b3 times c3 b4 times c4 and so on it'll then return the sum of all of those multiplications let's check out an example using data from a kitchen supplies company you might remember this example from our countif and sumif video we've been given some data about a product order including the quantity of each product that was sold in the order and the unit price which tells how much one of each product cost our job is to use the data in these two columns to find out the total revenue for this order and that's where sum product comes in to find the total revenue we need to do both addition and multiplication calculations so first off we need to find the revenue that each item brought in separately if we did this without sum product we'd have to multiply each quantity by its unit price 50 times a dollar 25 25 times five dollars and so on then we'd have to add all of those revenue amounts together to get the total revenue fortunately the sum product function does all of that for us let's add the label total revenue in cell g5 and then click g6 to input our formula we'll then start our formula with an equal sign and the function followed by an open parenthesis it's good to remind ourselves that the arrays we add to our formula should always be inside the parentheses next we'll select cells b3 through b7 for the first array followed by a comma the comma acts as a separator between the two arrays and the formula then we'll select cell c3 through c7 for the second array followed by a closed parenthesis to complete our formula we don't need to include the brackets in our actual formula we included them in the syntax example to clearly define each array for you then we press enter to get our total revenue since we're dealing with revenue we'll format the number as currency so we've learned the total revenue is 655 dollars but that's not the actual profit from the sales of these kitchen supplies because we haven't included the profit margin in our calculations the profit margin is a percentage that indicates how many cents of profit have been generated for each dollar of sale so in our data set product number 789 has a profit margin of 20 percent meaning each product sold earns a total profit of 20 cents for every dollar and just like the calculation for revenue we can save time finding profit margin by using the sum product function there's only one difference between the formula for profit margin and revenue in this spreadsheet but it's an important difference to start in cell g7 we type the same first part of the formula then we include the two arrays in the same way as well but instead of ending our formula we add another comma followed by another array this time we'll select the cells with the profit margin d3 through d7 we'll finish our formula and our calculation is complete the sum product function saved us from having to multiply each individual revenue amount by each profit margin percentage then add each profit margin amount together using some product for calculations is a time saver and helps you avoid making mistakes definitely a trick worth remembering and there's more worth remembering about calculations coming up next hey there by now we've learned a lot about functions and formulas they are very helpful tools for your toolbox and great for finding shortcuts to complete calculations but there's another tool out there that does some of the same things in a spreadsheet the pivot table as a quick reminder pivot tables let you view data in multiple ways to find insights and trends we've talked before about how pivot tables help with cleaning and organizing your data including sorting and grouping data but pivot tables can also help with calculations for example they're great for quickly calculating sums and averages let's revisit our movie data set to show you how pivot tables and calculations work hand in hand earlier we summarized and organized this data in pivot tables we'll do that here too but in this case the organization is a bonus to using pivot tables for calculations you can do these steps in excel as well though some of the steps might look different in this example your manager asks you to find some trends to help them think through new movie ideas using revenue calculations this spreadsheet has data about movies from several years ago so it probably wouldn't be as useful right now still the steps we take to analyze the data absolutely apply then and now so let's get into it first we need to find out how much revenue was generated each year a pivot table is a good way to organize this so we'll build a pivot table to show this in our pivot table we can also find the average revenue per movie we can then check our findings for some possible trends we'll start by finding the revenue generated each year this gives us the release date for each movie in column b and the box office revenue and column in instead of organizing the table by year and building a formula to calculate the revenue per year we'll create a pivot table we'll add the pivot table in a new sheet keeping the data range from cell a1 to cell n509 adding a new sheet is especially helpful when working in a large data set like this it helps keep our calculations together in one place and separate from the rest of the data we'll rename this new sheet revenue to call out where our calculations are both for ourselves and for anyone else on our team who might need our analysis now we can build our pivot table starting with the rows we'll sort the rows by release date to find the revenue for each year you might notice this creates a row for every date on which one or more movies in this data set was released since we only need the years we'll right-click in one of the cells in the release date column to create a pivot date group and we'll group by year now we have one row for each year in which these movies were released next we want to work with values we'll add the box office revenue data here this populates the column next to the release dates with a total box office revenue in each year these calculations are automatic because the pivot table is already set to summarize the data using the sum function so no need to change this setting there's other functions in the summarized by menu though like men for the minimum amount of revenue for each year and count for the number of movies that generated revenue in each year okay let's check out what we've got here this data shows that 2014 had the highest box office revenue while 2016 had the lowest this might be useful information but finding the average revenue per movie would most likely be more useful since there was a number of different movies released each year so we'll add another column for the average revenue earned by each year's movies we can do this in the same pivot table we'll add another value and change the function that we use to summarize from sum to average the average function gives us the average revenue per year for the movies in the data set we can see that the average revenue in 2015 was much lower than in other years since this data stands out so much let's keep exploring to find out why taking your analysis to the next level like this is a sign of a great analyst when you're in your job you want to answer the questions that your managers and stakeholders ask but you also want to answer the ones that come up while you're doing your analysis so let's try to figure this out first we'll note how many movies from each year were included in the data set we'll add a new value and use the count function this time this shows us that there's more movies in the data set from 2015 than from any other year but 2015 still has the second lowest total box office revenue this could mean a few things it's likely that a lot of the movies from 2015 just didn't earn much revenue compared to the other years which would bring down the overall average revenue even if the total revenue remained on par with the other years we'll explore just this one possibility here but you can always go further when you analyze data and your own job it'll depend on your objectives and the questions you need to answer for now let's copy and paste our pivot table so we can test our hypothesis we'll rename the columns in our copy table to differentiate them from our original table we'll name them based on the data we'll be looking at which i'll explain in a moment another great pivot table feature that will help us here is the filter option earlier you learn how this lets us filter data so we can look at only the values we need so we'll select a cell in our copied pivot table and add a filter to the box office revenue the filter will then be applied to the entire table when we open the status menu we can choose to filter the data to show specific values but in our case we want to filter by condition so we can figure out how many movies in each year earned less than 10 million dollars of revenue this is why we renamed our columns with a less than sign and an abbreviation for 10 million dollars the condition we'll use in our filter is less than and our value will be 10 million dollars we'll type our number in a dollar incense format so the condition matches the data in our pivot table this might not be necessary but it prevents potential errors from happening now we know that 20 movies released in 2015 made less than 10 million dollars this seems like a high number compared to the other years but keep in mind there were more movies from our data set released in 2015 than the other years it's still hard to tell how much of an impact these lower earning movies had on the average revenue so let's run a quick formula to find the percentage of movies for each year that earned less than 10 million dollars this will make it easier to compare them from year to year we'll create a customized column called a calculated field using our values menu a calculated field is a new field within a pivot table that carries out certain calculations based on the values of other fields you can do this in excel too using field settings and the create formula to get our average we'll divide the count from our movies under 10 million dollars while the count for all movies in our original pivot table after count and an open parenthesis between single quotes we'll type a less than sign and then the rest of our column name in a closed parenthesis then we'll type a slash as our division operator and we'll repeat the previous steps from our box office revenue count then we'll select custom from our summarize buy menu since we're not using the sum function that didn't work we have an error message in the calculated field it might be because we're using values from two different pivot tables let's first check that a calculated field will work in our copied pivot table we'll change our formula to prove that calculated fields work and to verify that the values in our average column are accurate for our new formula we'll use the sum function and divide the sum of the box office revenue data from our original table by the count of the same data since we applied our filter for this pivot table earlier the formula will only return the average revenue of movies under 10 million okay that worked so we know that it's not a problem with the calculated field and as an added bonus we were able to check the accuracy of some data always a good thing but we still need to find the percentage of movies for each year that earned less than 10 million dollars so instead of a calculated field we'll add a formula in a new column then we can use the values from both of the pivot tables we'll need for this calculation we'll add a heading for our table in cell g10 we'll name it percent of total movies then we'll add our formula to the next cell in the column we'll divide the number of movies in the copy table by the number of movies in the original table next we'll use the fill handle in the cell with our formula and drag it to apply the formula to the rest of the years and to cap it off we'll format these numbers as percentages now our analysis shows that 16 of the movies released in 2015 earn less than 10 million dollars of revenue the other years are all close to 10 this is one possible explanation for why the average revenue is comparatively low in 2015. in real life we'd most likely need to take our analysis even further and explore the data even more depending on our objectives but for this video our objectives are complete you've learned how you can use pivot tables to perform data calculations it'll take practice but pivot tables are worth it because they do more than calculate they organize and filter data too pivot tables are also used to form the framework for lots of data visualizations like graphs and other charts you'll learn more about data visualization soon for now though take a second to breathe we've covered a lot of complex ideas in the last few videos functions formulas and pivot tables are all great tools to use in your analysis and with practice and experience it'll feel like you've used them forever just take your time getting to know how they work use these videos and other resources like the readings we've included to explore them further in the meantime we'll discover how the spreadsheet calculations you've been learning about connected calculations using sql stay tuned by now you probably know that there's more than one way to do the daily task of a data analyst calculations are no exception as we've shown in earlier videos you can complete the same calculations in lots of different ways and spreadsheets you can also complete them using sql in this video we'll give you an overview of how sql calculations compare to spreadsheet calculations let's look at the arithmetic operators that are used in both spreadsheets and sql an operator is a symbol that names the type of operation or calculation to be performed in a formula as you learned earlier the four basic arithmetic operators in spreadsheet formulas are the plus sign for addition the minus or hyphen for subtraction the asterisk for multiplication and the forward slash for division these same operators calculate data in the same way when writing queries in sql the operators are embedded in the queries when pulling data from a database just like spreadsheet formulas there's a few different ways to perform calculations using queries let's look at the syntax for one possible query the syntax of a query is its structure it should include all the specific details of the data you want to pull into a new table where those details should be placed so if you want to add values from two columns of a table you start with the select command followed by the name of your first column then the name of your second column then you'd add the names of both columns with a plus sign between them after that you type as followed by the name you'd like to give the column with the added totals finally you then complete your query by typing from and then the name of the table that you're pulling the data from running this query would get you a table showing the two columns whose values are being added together plus a new column showing the sums of those values the operator in this query is a plus sign since values are being added if you needed to subtract multiply or divide you'd follow the same steps using the appropriate operators if you need to use more than one arithmetic operator in a calculation you'd use parentheses to control the order of the calculations so if we included column c in our query we could place parentheses around column a plus column b we then add an asterisk if we're multiplying followed by column c this query would return in a new column the sum of the values in column a and b multiplied by the values in column c all right now let's say you only wanted the remainder from a division calculation well you need a different operator for this the modulo operator the modulo operator is represented by the percent symbol this is an operator that returns the remainder when one number is divided by another in a spreadsheet you could complete the same calculation using the mod function this brings us to another similarity between calculations in spreadsheets and sql a lot of times you can use functions instead of operators to complete calculations for example the sum function can complete addition problems in spreadsheets and sql the average function in a spreadsheet is the same as the avg function in sql they both return the average value of a set of numbers in sql these functions are considered aggregate functions because they perform a calculation on one or more values and return a single value you'll learn more about how they're used with the group by command and a query soon so those are the basics of sql calculations knowing how to write a query for a calculation is a good first step stay with us and you'll learn more about calculations in sql bye for now hi again earlier we showed you how to complete calculations in sql while there's a few different ways embedding them in queries is a very useful one when you include a calculation in a query with other commands you can do more work faster here's a basic query syntax that we talked about we start with select and then the names of the columns we want to use in our calculations then we add in calculation details including an operator like a forward slash for division next we type as followed by the new columns name to label the column with the calculated values finally we end our query with the from command and the name of the table that we're pulling data from now let's take it to the next level with some embedded calculations that use a syntax like this one even better we'll do this with some data about avocados sorry to those of you who aren't avocado lovers like me okay let's get started feel free to continue watching as we show you the steps using bigquery if you're joining us open up your tool of choice for using sql be sure to look through the instructions in the reading right before this video to help you get started okay data's already loaded so we can jump right in our goal is to find out the total number of bags of avocados sold on each date at each location using this data there's already a column that shows us the total but we want to make sure we understand how that total is calculated we want to make sure that the total column is just small large and extra large bags added together so we'll add the values in those three columns together in our query and then compare them to the total bags column in the data set we'll start with the select command which we'll use to pull certain columns from the table we are selecting several columns so we'll press enter after select and after the comma after each column name next we'll type those column names date region small underscore bags large underscore bags extra large underscore bags in total underscore bags underscores are the lines used to underline words and connect text characters using spaces can confuse certain servers and applications so using underscores instead helps avoid potential issues while keeping the names readable now we'll add the calculation to the query using the names of the three columns with plus signs between them small bags plus large bags plus extra large bags since we want this calculation in a new column we'll use the as command to name the column total underscore bags underscore calc we've added the word calc so we can compare the columns to each other after we calculate our results now we'll finish our query with from and the name of the data set and subset we're pulling from avocado underscore data dot avocado underscore prices okay let's run the query in the total bags calc column the data shows the sum of each date for the number of small large and extra large bags of avocados that were sold at each location if we quickly compare the two columns showing the total number of bags we learned that the values are the same this lets us know that the data we want to use is the right data now that we have verified the total number of bags we can use those values in another query we need to find what percent of the total number of bags were small bags finding this out might help stakeholders make decisions on how to package avocados or which size bag to run a sale on our job is to get that information to the stakeholders so we'll set up a new query we'll select the date region total bags and small bag columns for this query next we'll set up a new column starting with our calculation to find the percentage of small bags we need to first divide the number of small bags by the number of total bags using a slash as the operator we'll put this part of the calculation in parentheses to let the server know that this calculation should be performed first then we'll multiply this total by a hundred using an asterisk as our operator multiplying by 100 gives us a value that's a percentage instead of a decimal percentages usually make it easier for people to understand quickly when you share results with them we'll use the as command to name this new column small bags percent then we'll add from and the name of the set we're pulling from and we'll run our query oops we got an error in our results it says that we can't divide by zero since we're finding percentages dividing by zero won't work this means that somewhere in the data set there's a total bags equal to zero we'll have to fix this in our query we can fix this using the where command where lets us add a condition to our calculation so after we type where we'll type total underscore bags followed by a less than sign and then a greater than sign these symbols tell the server that the values we're calculating should not be equal to the value we specify in this case that value is 0 so we'll add a 0 to our query now when we run the query you'll notice our new column shows the percent of small bags in the total bags count we'd get the same results if we used an exclamation mark followed by an equal sign in place of the less than and greater than signs note that this is one way for doing it but there are functions such as safe divide that also allow you to avoid this error those are just a couple of examples to get you started but with sql you can complete just about any calculation you want during your analysis and embedding the calculations in your queries will help you keep your analysis organized while getting your results the calculation methods we showed you here are just the beginning so look for more coming up see you soon hey good to see you as a data analyst you'll find that your calculations come in all shapes and sizes earlier we showed you how to do some of the more basic calculations in sql while basic calculations are great sometimes you'll need to group data before completing calculations the group by and order by commands help you do this these commands are usually paired with aggregate functions like sum or count we'll show you how you can use these commands and functions to calculate and summarize data from groups of rows in a table let's explore the group by command first group by is a command that groups rows that have the same values from a table into summary rows the group by command is used with select statements in a basic select from or select from where query group by comes at the end of the query alright let's try using group by we'll work with a database with data from a bike sharing system we want to find out how many rides people took on these bikes per year this data has several columns but for this task we only need the start time column since this data set isn't organized by date and the start time column isn't organized by year we'll need to include steps in our code to organize it we also want a total number of rides each year so we'll need to include a calculation in our query for this and depending on the questions we've been tasked with answering this might be the first step of many in our analysis we'll start our query with the select command we'll then add extract to our query the extract command lets us pull one part of a given date to use we'll extract the year from the start time column to do this we'll add an open parenthesis followed by year which lets the server know the part of the date we need then we'll add the from command and start time so we can get the year from all of the start times in that column we'll close the parentheses and then use as and the word year to name the column we're creating on the next line of the query we'll use the aggregate function count followed by an asterisk in parentheses this will count the bike rides in the start time column using the asterisk make sure that all the start times are counted in the data then we'll name our column number of rides with underscores between each word instead of spaces we'll add from and the database we're pulling on the next line in this case that's bigquery dash public dash data dot new underscore york dot city bike underscore trip and here's our group by command we'll use this to group the data by year so we'll type group by followed by year we can further organize our results by using the order by command adding this after group by orders the results we'll add year to order the data by year it's good to note that by default order by sorts data in ascending order now we can run our query to get the results the years are ordered starting with 2013 and ending with 2016. if we want to change this to descending order we can add the keyword d e s c to the end of the query and run it again but whichever order you use the group by and order by commands are great for helping us complete and organize a calculation for our analysis this is one way to include calculations when aggregating data and it's just one of many ways that sql helps keep your analysis running smoothly and moving forward there's still more to come on calculations in sql coming up we'll learn more about data validation see you soon hi again earlier we covered data validation a spreadsheet function that adds drop down lists to cells using data validation lets you control what can and can't be entered into your worksheet one of its uses is protecting structured data and formulas in your spreadsheets but as useful as it is the data validation function is just one part of a larger data validation process this process involves checking and rechecking the quality of your data so that it is complete accurate secure and consistent while the data validation process is a form of data cleaning you should use it throughout your analysis if this all sounds familiar to you that's good ensuring you have good data is super important and in my opinion it's kind of fun because you can pair your knowledge of the business with your technical skills this will help you understand your data check that it's clean and make sure you're aligning with your business objectives in other words it's what you do to make sure your data makes sense keep in mind you'll build your business knowledge with time and experience and here's a pro tip asking as many questions as possible whenever you need to will make this much easier okay let's say we're analyzing some data for a furniture retailer we want to check that the values in the purchase price column are always equal to the number of items sold times the product price so we'll add a formula and a new column to recalculate the purchase prices using a multiplication formula now comparing the totals there's at least one value that doesn't match the value in the purchase price column we need to find an answer to help us move forward with our analysis by doing some research and asking questions we find that there is a discount of 30 percent when customers buy five or more certain items if we hadn't run this check we could have missed this completely you've learned that as an analyst calculations are a big part of your job so it's important that whenever you do calculations you always check to make sure you've done them in the right way sometimes you'll run data validation checks that are common sense checks for example let's say you're working on an analysis to figure out the effectiveness of in-store promotions for a business that's only open on weekdays you check to make sure that there's no sales data for saturday and sundays if your data does show sales on weekends it might not be a problem with the data itself it might not even be a problem at all there might be a good reason like maybe your business hosts special events on saturdays and sundays then you would have sales for those weekends you still might want to leave out the weekend sales in your analysis if your objective is only to look at the weekdays but doing this data validation might save you from miscalculations and other errors in your analysis you should always do data validation no matter what analysis tool you're using in an earlier video we used sql to analyze some data about avocados one of the queries was a check to make sure the data showing the total number of bags was the sum of small large and extra large bags by running this query we were able to determine that the total number column was accurate we compared our two columns briefly in that video but to be absolutely sure that there's no issues with the data values in those columns we could have also run another query in this query we would select all using the asterisk and from the avocado prices data set in our where clause we'd also type out where our calculated total does not equal the total bags column if no values are returned we can be sure that the values in the total bags column are accurate and that led us to continue our analysis but when we tried to find what percent of the total number of bags was small we ran into a small problem we received an error message about dividing by zero we fixed that error by adjusting our query if we had linked that query to a presentation that went to our stakeholders they'd show us the divide by zero error instead of the figures we wanted by building in these types of checks as part of your data validation process you can avoid errors in your analysis and complete your business objectives to make everyone happy and trust me it's a great feeling when you do and another great feeling is knowing that you've made it through another video and learned something new and we have more where that came from coming soon see ya hello again now if you're like me you always have sticky notes available nearby to write a reminder or figure out a quick math problem sticky notes are useful and important but they're also disposable since you usually only need them for a short time before you recycle them data analysts have their own version of sticky notes when they're working in sql they're called temporary tables and we're here to find out what they're all about a temporary table is a database table that is created and exists temporarily on a database server temp tables as we call them store subsets of data from standard data tables for a certain period of time then they're automatically deleted when you end your sql database session since temp tables aren't stored permanently they're useful when you only need a table for a short time to complete analysis tasks like calculations for example you might have a lot of tables you're performing calculations on at the same time if you have a query that needs to join seven or eight of them you could join the two or three tables having the fewest number of rows and store their output in a temp table you could then join this temp table to one of the other bigger tables another example is when you have lots of different databases you're running queries on you can run these initial queries in each separate database and then use a temp table to collect the results of all of these queries the final report query would then run on the temporary table you might not be able to make use of this reporting structure without temporary tables they're also useful if you've got a large number of records in a table and you need to work with a small subset of those records repeatedly to complete some calculations or other analysis so instead of filtering the data over and over to return the subset you can filter the data once and store it in a temporary table then you can run your queries using the temporary table you've created imagine that you've been asked to analyze data about the bike sharing system we looked at earlier you only need to analyze the data for bike trips that were over 60 minutes or longer but you have several questions to answer about the specific data using a temporary table will let you run several queries about this data without having to keep filtering it there's different ways to create temporary tables in sql depending on the relational database management system you're using we'll explore some of these options soon for this scenario we'll use bigquery we'll apply a with clause to our query the with clause is a type of temporary table that you can query from multiple times the with clause approximates a temporary table basically this means it creates something that does the same thing as a temporary table even if it doesn't add a table to the database you're working in for others to see you can still see your results and anyone who needs to review your work can see the code that led to your results let's get this query started we'll start this query with the with command we'll then name our temp table trips underscore over underscore 1 underscore hr then we'll type the as command in an open parentheses on a new line we'll use the select from where structure for our subquery we'll type select followed by an asterisk you might remember the asterisk means you're selecting all the columns in the table now we'll type the from command and name the database set we're pulling from bigquery dash public dash data dot new underscore york dot city bike underscore trips next we'll add a where clause with the condition that the length of the bike trips we need in our temp table are greater than or equal to 60 minutes in the query it goes like this trip duration space greater than sign equal sign space 60. finally we'll add a closed parenthesis on a new line to end our sub query and that sets up our temporary table now we can run queries that'll only return results for trips that lasted 60 minutes or longer let's try one since we're working in our version of a temp table we don't need to open a new query instead we'll label our queries before we add our code to describe what we're doing for this query we'll type two hashtags this tells the server that this is a description and not part of the code next we'll add the query description count how many trips are 60 plus minutes long and then we'll add our query select then on a new line count with an asterisk in parentheses and as followed by cnt to name the column with our count next we'll add from and the name we're using for our version of a temporary table trips over one hour when we run our query the results show the total number of bike trips from the data set that lasted 60 minutes or longer we can keep running queries on this temp table over and over as long as we're looking to analyze bike trips that were 60 minutes and over and if you need to end your session and start a new run time later most servers store the code used in temp tables you'll just need to recreate the table by running the code when you use temporary tables you make your own work more efficient naming and using temp tables can help you deal with a lot of data in a more streamlined way so you don't get lost repeating query after query with the same code that you could just include in a temp table and here's another bonus to using temp tables they can help your fellow team members too with temp tables your code is usually less complicated and easier to read and understand which your team will appreciate once you start to explore temporary tables on your own you might not be able to stop don't say i didn't warn you coming up we'll explore even more things you can do with temp tables see you soon hello there earlier we introduced you to temporary tables they're a great resource to use during your analysis because they help you keep your sql code organized and efficient you learned how to use a with clause to create a type of temporary table and now we'll get into some other ways you can create temp tables along with the pros and cons they present and that's one of the great things about data analytics there's almost always more than one way to get your analysis done the select into statement is a good example of how to get a temp table done this statement copies data from one table into a new table but it doesn't add the new table to the database it's useful if you want to make a copy of a table with a specific condition like a query with a where clause so far we've been using bigquery to show you how sql works but bigquery doesn't currently recognize the select into command instead here's an example of how a select into statement might look in another rdbms in this statement a new table named africa sales is created using the data from the global sales database about the african region using select into is a good practice when you want to keep the database uncluttered and you don't need other people using the table now if lots of people will be using the same table then the create table statement might be the better option this statement does add the table into the database so if everyone needs access to the africa sales table your query would start with create table followed by the same select from where query as in the select into statement in most relational database management systems or rdbms you can add metadata to describe the data that's contained in the table you've created this can help make the table easier to understand for anyone using it the create table statement is also useful for tables that are more complex for example if the code's difficult to replicate the making attempt table in this way means it'll be safe for you to access later the way you create a temporary table using the with clause or a select into or create table statement is usually up to you and your needs the more you work in sql the more you might have preferences as well especially since there's more than one way to create temporary tables you may also find that you're working in an rdbms that uses a different syntax for example you might need to use a create temp table statement instead of create table here's some good news the syntax that you need for each unique rdbms is usually pretty easy to find with a quick online search but no matter how or where you create temporary tables there isn't much downside to them it's good to note though that sometimes building a temp table can interrupt your workflow again that'll depend on your objectives and your preferences you can repeat your code over and over instead of making a temp table but that usually leaves your queries less readable and more vulnerable to typos as you continue exploring the world of data analytics you'll find that temporary tables are just one of the many resources you'll be able to use and the more you use them the easier it'll be to navigate that world completed another course congrats we've still got some more to learn but you can always go back and review what you've covered already coming up you'll meet my colleague kevin kevin can't wait to show you everything the next part of the program has to offer and now that you're familiar with lots of different ways to analyze data you'll be ready to share your analysis results through visualizations and reports up next we'll focus on helping you develop your data visualization skills with tools like tableau that'll help you get your visualizations looking great you'll also learn how sharing your data is really about telling the story of your data and how it answers the questions that prompted your analysis finally you'll learn what it takes to make an effective presentation and how to manage and respond to the questions thanks so much for joining me in exploring the world of data analytics you've made it this far so be proud and keep it up as you move to the next course
Info
Channel: Free Engineering Courses
Views: 3,109
Rating: 5 out of 5
Keywords:
Id: 2cvVnvkS-IQ
Channel Id: undefined
Length: 192min 39sec (11559 seconds)
Published: Sun May 23 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.