Intro to Data Formatting in Spreadsheets | Google Data Analytics Certificate

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
this video is part of the google data analytics certificate providing you with job-ready skills to start or advance your career in data analytics get access to practice exercises quizzes discussion forums job search help and more on casera and you can earn your official certificate visit grow.google datacert to enroll in the full learning experience today [Music] hey there so far we've learned about typecasting data with sql as a way of converting data from one type to another in databases now i want to check out another way to format data types within spreadsheets in this video we'll talk more about why making sure your data is formatted properly is so important and how to format numbers and convert units of measurement in your spreadsheets let's get started sometimes you need to convert data when you're working with spreadsheets that might mean changing numbers into dates strings percentages or even currency it's important to double check that all of your data is in the right format for your analysis sometimes even after cleaning and processing data it still might not be in the right format you need let's think back to the table with movie data from before there were a lot of different data types that included numbers such as dates budgets and text strings like actors names these are distinct values but the spreadsheet doesn't always automatically know that here's an example let's say you wanted to sort the movies in this spreadsheet by most recent if the spreadsheet casts them as strings instead of dates it might sort them alphabetically until you change the data type you won't be able to sort them the way you want it's also possible that your data sets contain inconsistent units of measurement that you'll need to convert like say a table that includes both u.s dollars and english pounds that's why it's important to check those data types again so you don't run into any problems during the actual analysis think about the incorrectly cast states in our movie table if your boss needed a list of the 20 most recent movies but your spreadsheet was organized alphabetically instead of by the most recent you wouldn't be giving her the list of movies she needed incorrectly formatted data can lead to time-consuming mistakes and your analysis and might end up affecting your stakeholders decision making but taking the time early on to convert and format your data can help you avoid that and now that you know why you'll need to convert data types while working in spreadsheets let's find out how first let me show you a really useful menu for specifying data types and spreadsheets here's the movie data table we used before but now the money columns aren't typed as currency on the toolbar at the top of the sheet you'll find a menu that can help you convert these numbers into specific data types it gives you a lot of choices just from the drop-down menu such as number currency date percentage and if you click to open the full menu there's even more options including one for a custom number format we know that we want these columns to be in currency format so let's do that all i have to do is select this column and then hit the currency shortcut and now it's all typed correctly but it doesn't stop there you can go even further and convert the unit of measurement you're using for this example let's check out a different table imagine that you're working with a weather channel to gather data about daily temperatures you have a table with some data about daily observations on the temperature wind speed and precipitation in this area right now the temperatures are in fahrenheit but for your analysis you need them to be in celsius no problem all you need to do is use the convert function to change the unit of measurement we'll use this empty column here here's the first temperature in the table we'll input the convert function in our new column to change it to celsius then we need to put what cell we want converted and finally we're going to convert impresto now this cell has the right unit of measurement for your analysis you can simply apply it to the rest of this column now this temperature data is all in celsius and your unit of measurement is consistent across the table and here's another tip when adding data to tables using a formula go back and paste the data in as values afterwards that way they're locked in otherwise the cell stays as a formula and could get confusing when you start working with the data so let's do that now we'll copy the values and then right click in a new column there's an option for paste special and there's an option to paste values only and now we have the static values in this column making sure your data's in the right format before you start analysis is so important do this and your analysis will return the kinds of answers you're really searching for and now you know some ways to typecast numbers and convert units of measurement in spreadsheets you can feel confident your data is formatted the right way next up we'll talk more about adjusting your data for analysis and data validation see you soon [Music] while we're learning about formatting data i want to talk to you about another spreadsheet feature data validation in this video i'll teach you a little bit about data validation and show you how to use it for now when i say data validation i'm talking about the function which is different from the data validation process we'll get into that later on but first let's talk about what data validation does in spreadsheets basically it allows you to control what can and can't be entered in your worksheet usually data validation is used to add drop-down lists to cells with predetermined options for users to choose from if you have a spreadsheet with a lot of collaborators this can make it easier for them to interact with your table you can think of it like a multiple choice question on a quiz and since you can control what's being entered into the worksheet it cuts down on how much data cleaning you have to do later on let's figure out how we might do that for this example we'll work on a project with a lot of milestones and deadlines to keep track of let's say our team has a spreadsheet that tracks everyone's progress but instead of making everyone write in where they are on their task individually we can provide a drop down menu with multiple options like not yet started in progress and ready so we'll select the column that we want to add the drop down menus to in this case the status column then we'll go to the data pull down menu here at the top and click data validation this brings up a pop-up menu with options for data validation in this case we know that we want to add a list of items for other users to choose from so we'll select the list of items option and the possible criteria and type in the selections we want to create then hit save and now all of those cells have drop down menus that we can use to easily mark progress for each task but there's other things that you can do with data validation and spreadsheets too like creating custom checkboxes to do this let's select the cells under the review column to make a checkbox that will let us know if tasks have been approved or not we'll go back to the data validation menu but instead of choosing list from a range we'll choose check box there's an option to use custom cell values let's choose that and put in approved and not approved now these tasks can be checked off by whoever's reviewing them like a project manager for example another way we can use data validation is to protect structured data and formulas the more people that are working together in a spreadsheet the more likely someone can accidentally break a formula but good news the data validation menu has an option to reject invalid inputs which helps make sure our custom tools will continue to run correctly even if someone puts the wrong data in by mistake all right now you know three uses for data validation in your spreadsheets adding drop down lists creating custom check boxes and protecting structured data and formulas data validation can help your team track progress protect your tables from breaking when working in big teams and help you customize tables to your needs coming up we'll learn more about conditional formatting and some ways you can use conditional formatting and data validation together see you soon so earlier we talked about conditional formatting as a spreadsheet tool that changes how cells appear when values meet specific conditions this lets you add visual cues to your spreadsheets that make it easier to understand your table at a glance and it makes the information in the worksheet clearer to your stakeholders in this video we'll take that even further by combining conditional formatting and data validation to create custom tools for our spreadsheets so far we've used conditional formatting to highlight empty cells that still needed data so that we could quickly pinpoint what information our table was missing and added in now let's build on that by using it to make our scheduling table easier to read at a glance here's a table we worked with when we covered data validation it's tracking the status of different tasks on our project for our team to check on but now there's even more tasks than the last time we looked at it this table has useful information but it takes a second to understand right now we don't have a visual on how many tasks are in progress or how many upcoming deadlines there are but if we color coded those elements of the table we could quickly see key pieces of data really easily let's start with the status column column c in the last example we created these drop down menus with the data validation tool now we can use conditional formatting to add some color let's go to the conditional formatting option under the format menu this brings up a sidebar where we can select our range rule in formatting style we need to decide which rows to apply our formatting to when the condition we set is met we can click this button in the range options to select all of the rows we're applying the formatting to instead of typing it in now that we have those cells selected we can choose the rule that we want to apply to these cells we already have drop down menus with specific text so we can choose format cells if text is exactly from the rules for our first rule let's write not yet started as the text condition then we'll choose a color to apply to those cells that have not yet started in them let's use red now all cells that have not yet started selected from the drop down menu will be red let's hit the add another rule button to add conditional formatting to our other status options let's add the condition in progress next we can make that one yellow and then we'll add one last rule for ready let's choose green and there now we have an easy to understand visual cue that tells us how many tasks are in progress and how many are completed we can also combine data validation and conditional formatting to track upcoming deadlines we have a column of dates called review by this date first let's use the data validation functionality to make sure users only enter valid dates we'll go back to the data drop down at the top pull up data validation and select date as our criteria then we can go to the format menu at the top go down to conditional formatting and open the sidebar again we'll click the select range icon and select the review by this date column now under format rules we can select data's after which will give us another option let's choose today and finally let's choose the color for these cells so if the date listed in these rows is after today it'll be filled in orange you can also choose a specific locked date if needed but for now let's go with today now all of the upcoming review dates have an easy to see color code so anyone using this table can quickly reference these deadlines you'll find that some spreadsheet programs like excel have built-in color codes that you can use too and there you go now you know how to use data validation and conditional formatting to create custom tools and visual cues that make your information easy to understand there's a lot of different ways to use these tools so feel free to experiment with them in your own spreadsheets coming up we'll keep learning about new tools for spreadsheets and sql bye for now in this video we'll build on what we learned about concatenate and import range by exploring a new sql query concat you might remember that concatenate is a function that joins together two or more text strings and as a quick reminder a text string is a group of characters within a cell most often composed of letters you've seen how this works within a single spreadsheet but there's a similar function in sql that allows you to join multiple text strings from multiple sources concat let's use concat to combine strings from multiple tables to create new strings for this example we'll use open data from citibike which is a public bike sharing system in new york as you learned earlier open data initiatives have created a ton of data for analysts to use openness or open data is free access usage and sharing of data it's a great resource if you want to practice or experiment with the data analysis tools you've been learning here we have open access to the new york city bike sharing data which has information about the use of the shared bikes across the city so now we can use concat to pull and concatenate data from the different columns stored here the first thing we need to do is figure out which columns we need that way we can tell sql where the strings we want are for example the bike sharing company has two different kinds of customers one-time paying customers and subscribers let's say we want to find out what routes are most popular with different user types to do that we need to create strings of recognizable route names that we can count and sort we know that the information we need is in the station and trips table so we'll start building our query accordingly first we'll input select user type to let sql know that we want it as a column then we'll use concat to combine the names of the beginning and ending stations for each trip in a new column we'll type in as route to make a route column this will make these route names easy for us to read and understand after that we'll want sql to count the number of trips so we'll input count and an asterisk to tell it to count up the number of rows in the data we're selecting where each row represents a trip in a new column now let's also create a column for trip duration which we can just round up notice how our sql functions first cast the values in the trip duration column as integers then calculates the average value for the trip duration and then rounds that value to two decimal places next we'll tell sql where this information is stored so we'll use from and the location we're pulling from since we're using count and average functions in our select clause we have to use group by to group together summary rows let's group by the start station the end station and the user type for this query finally we'll use order by to tell it how we want to organize the data for this we want to figure out the most common trips so we can input the number of trips column and use des to put it in descending order finally we only want the top 10 so let's add limit 10 and now thanks to a concat we can easily read the route names and trace them back to real places we can see which kinds of customers are taking which routes which can help the bike sharing company understand their user base in different parts of the city and where to keep more bikes for people to rent being able to combine multiple pieces of data can give you new ways to organize and analyze data and there's a lot of different tools to help you do that now you've seen concat in action and later you'll come across another similar query join but up next we'll talk more about working with strings see you soon [Music] so far we've worked with strings in both sql and spreadsheets before and we've learned that they usually have similar functions in this video we'll take another look at lin left right and fine you've come across these functions used in sql but now you'll find out how they work in spreadsheets going back to our bike sharing data set let's check out one of their spreadsheets this is one of the trip data spreadsheets in the start time and stop time columns there are strings that contain information about date and time of each ride this is all useful data but chances are we'll only need part of these strings to make a formula or answer a question for example these strings contain multiple data points like date and time but if we're trying to find the average time between start times we won't need the date we can actually use lin left and right and find to split the timestamps into separate columns if we want let's build a simple formula to separate the dates in these strings we know that lin tells us the length of a string let's check how long these date time strings are now to start we'll input the first part of the formula and then we'll just select one of the cells with the date time string in it these strings are 19 characters long we can use the find function to locate specific characters in a string keep in mind this is case sensitive so if you're using find to pull a sub string make sure that you've input the substring correctly we notice that all of the date time strings have a space separating the date and the timestamp so we can actually use fine to figure out where the date ends okay seems like the space is the 11th character in this string so the timestamp substring will start at character 12. we can use the left and right functions to select which parts of the string we want to isolate in a new column we'll use right on one of these cells to indicate that we want to grab the right side and like we've come across before left actually works exactly the same way now we can apply that to the rest of column c to pull those timestamps as a data analyst being able to work with strings is a key skill especially when you find yourself working with data from outside sources hopefully you're a little bit more comfortable applying lin right left and find functions in both sql and spreadsheets later on we'll use these functions to perform even more complicated formulas so feel free to try them out on some data yourself maybe even some open data like we've been using today see you later so data analysts spend a lot of time problem solving and that means there's going to be times when you get stuck but the trick is knowing what to do when that happens in this video we'll talk about the importance of knowing how to get help whether that means asking someone else for help or searching the internet for answers asking other people about a problem you're having can help you find new solutions that move a project forward so it's always a good idea to reach out to your peers and mentors especially if they're working with you on that project your team members have valuable knowledge and insight that can help you find the solution you need to get unstuck sometimes we spend a lot of time spinning our wheels saying i can do this myself but we can be way more productive if we engage with other people find new resources to lean on and try to get as many voices as we can involved for example let's say you're working with the bike trip time data from the previous videos maybe you're trying to find the average time between bike rides in a given month calculating the difference between bike rides before midnight is easy but you can run into a problem if the elapsed time crosses into the next day if someone went on a bike ride at 11 pm but the next ride wasn't until 6 am your formula will return a negative number because the end time is less than the start time you know that you can add one minus the start time if two bike rides start and end on different days but that formula won't work on times that happen in the same day and it's pretty inefficient to scroll through every bike ride to pinpoint these special cases you need to find a way to build a conditional formula but you aren't sure how so you decide to check in with other analysts working on your team to see if they have any ideas you could send them a quick email or stop by their desk to find out if they have a minute to talk it over with you turns out they had a similar problem on a previous project and they're able to show you a conditional formula that you could use to speed up your calculations great they suggest using an if formula like this this basically says that if the end time is larger than the start time replace the standard end time minus start time formula with one minus start time plus end time now it's also possible that your team members don't have an answer that's okay too there's definitely someone else with the same problem asking the same questions online knowing how to find solutions online is an incredibly valuable problem-solving tool for data analysis there's also all kinds of forums where spreadsheet users can ask questions and you never know what you can turn up with just a basic search for example let's say you look at calculate number of hours between times spreadsheets and find a helpful walkthrough for a more complicated formula using mod this flips the negative values into positive ones solving your calculation problem whether you're asking someone you know or searching the internet for answers reaching out for help can give you some really interesting solutions and new ways to solve problems for future analysis coming up we'll learn even more about searching for solutions online see you soon [Music] earlier we talked a little bit about finding resources online to help you figure out solutions to problems during analysis the internet has so much knowledge and advice to offer but you need to know how to find it so in this video we'll talk more about finding answers online you might think that great data analysts don't regularly rely on outside resources but that's a myth the best data analysts know that finding answers to their problems online can be empowering and give them new knowledge for the future being able to find new ideas and combine them with what you already know can help you come up with some amazing things so don't be afraid to turn to the internet to find your answers it's a great resource that lots of analysts use me included but let's talk more about how you can make sure you're using web resources in the best way possible there's a combination of best practices that you can use to guide your search for answers online by practicing the thinking skills we've learned in this program using the right data analytics terms and your basic knowledge of analysis tools you have everything you need to find answers and apply them to your own work and it starts with how you approach a problem mentally you've learned about different kinds of thinking skills and how to practice them and your data analysis work from analytical to mathematical to structured thinking this helps build your mental model or your thought process and the way you approach a problem data analysts use these thinking skills to approach a problem logically and break it into smaller parts building this into your own problem solving process can help you pinpoint specific questions which you can use to find resources more easily for example maybe you keep running into an error in your analysis you narrow it down to two possibilities your formula or the data itself you double check your formula and you see that it's correct so now you know that you need to make sure that data has been entered correctly you consider the problem logically and tracked it down to the source using your mental model next it's important to use the right terms when searching for solutions knowing how to frame data analytics questions with the same language other analysts are using will help you get more search results and it'll help you understand what other analysts are saying for example maybe you need to use the left four characters of a string for a column in sql how would you search for this searching for four characters and a column is a little vague and might not bring up specific resources but left string query sql uses some keywords that other data analysts are also using to talk about these things on top of being able to use the right terms to search online you also need to be familiar with basic tools that way when an online resource is walking you through a new function in a tool you've used before you'll know how those tools work for example if you find a spreadsheet formula online you need to understand how formulas work to apply it to your own spreadsheet or maybe the data set you're working with is too large for a single spreadsheet and you'll need to switch to sql having a variety of tools in your tool kit is important as a data analyst but just as important as knowing when to use them if you find yourself stuck on a problem it can be a good idea to take a step back and reconsider how you're approaching a task we've covered a lot of tools that you can use as a data analyst throughout this program and later you'll learn one more r we'll talk about r more later on but here's a sneak peek to get you excited for it r is another programming language but it's not a database language like sql it's a programming language frequently used for statistical analysis visualization and other data analysis r is a little different from other tools we've been working with but it's a great complement for the tools you're already using and it will give you more potential solutions when you run into problems using the thinking skills we've learned throughout this program the right terms and your understanding of different analysis tools will get you ready for the next steps of this process actually searching for answers online there's a lot of resources like program support websites and forums where other data analysts are asking and answering questions in an earlier video we ran into a problem trying to calculate the time elapsed between bike rides and the bike sharing data maybe our first search calculate time in spreadsheets didn't turn up the answers we needed by thinking about our specific question and how other data analysts might be asking it we could change that search into conditional formula for calculating elapsed time in spreadsheets and now we have more specific solutions to our problem finally being able to modify example code to fit your own needs is so useful understanding the syntax of formulas and functions for different tools will allow you to take what you learned online and make it work for you and maybe even build on it to create a whole new solution for example the mod formula we built to account for trips that started and ended on different days and our bike sharing data the mod formula we found online wasn't created for the data we were working with but because we are familiar with spreadsheet tools we were able to apply it to our data and use it as a solution to our problem great data analysts know how to find and use resources online to help them build new solutions to problems they face by using the thinking skills you've already learned in this program and using your knowledge of data analytics tools and terms you can too and once you've found some answers to your problems you can build them into your analysis work to get past any challenge you might face and that brings us to the end of this module great job we've covered a lot of information we've learned about converting and formatting data how to combine multiple pieces of data and how to search for help when you need support during your analysis coming up next you'll take on the weekly challenge as always feel free to go back over anything you've learned from these past videos then i'll see you for the next video good luck congratulations on finishing this video from the google data analytics certificate access the full experience including job search help and start to earn the official certificate by clicking the icon or the link in the description watch the next video in the course by clicking here and subscribe to our channel for more from upcoming google career certificates
Info
Channel: Google Career Certificates
Views: 866
Rating: 5 out of 5
Keywords: Grow with Google, Career Change, Tech jobs, Google Career Certificate, Google Career Certificates, Job skills, Coursera, Certification, Google, professional certificates, professional certificate program, Data analyst, Data analytics, Data analysis, Data analytics for beginners, What is data analytics, Sql, Data, R Programming, Spreadsheets, R tutorial, SQL tutorial for beginners, what is SQL, learn SQL, R programming for beginners
Id: ofcG687FGsU
Channel Id: undefined
Length: 32min 28sec (1948 seconds)
Published: Fri Jun 18 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.