Understanding Data Types and Structures | Google Data Analytics Certificate

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
this video is part of the google data analytics certificate providing you with job-ready skills to start or advance your career in data analytics get access to practice exercises quizzes discussion forums job search help and more on casera and you can earn your official certificate visit grow.google datacert to enroll in the full learning experience today [Music] picture this you're working on a project you've asked all the right questions applied structured thinking and you're completely in sync with your stakeholders you're off to a great start but there's another step in the process preparing the data correctly this is where understanding the different types of data and data structures comes in knowing this lets you figure out what type of data is right for the question you're answering plus you'll gain practical skills about how to extract use organize and protect your data hey my name's halle and i'm an analytical lead at google i work with companies in the healthcare industry i'm so excited to welcome you to this course you've been building up your data analyst skills in lots of different ways so far you've learned how to ask the right questions define the problem and present your analysis in a way that matches up with the needs of your stakeholders in other words you've learned how to tell a story using data now we'll learn more about the data that you'll need to tell the best story possible but before we do that i'd love to tell you my story i use analytics to help healthcare companies develop digital marketing solutions that make their business and their brand stronger my team and i find business and media opportunities based on the latest industry and data insights i've been working in healthcare for about five years and it's great i really enjoy being able to use data to help spark change in such an important industry as you'll discover in this course data can be the main character in a very powerful story i absolutely love using analysis to tell that story in a way that's compelling and informative here's a real life example of how i've used data to tell a story in my job we analyze medicare enrollment data over time and make connections to how people research medicare plans on google as people 65 and older become more informed decision makers for their health i use the data to learn if there's an increase in medicare enrollments and what part google searches play if there is an increase in demand now it's very important that i make sure the data is relevant and valid i also have to pay attention to questions around access and equity while maintaining the privacy of those conducting searches the happy ending of my story is that the data in my findings is useful to medical professionals and their patients there's so much useful data out there and you're building the skills you'll need to find and use the right data in the best way in this course you'll continue sharpening those skills so you've already heard a lot about the data analysis process steps ask prepare process analyze share and act now it's time to learn how to prepare the data you'll learn to identify how data is generated and collected and you'll explore different formats types and structures of data we'll make sure you know how to choose and use data that'll help you understand and respond to a business problem and because not all data fits each need you'll learn how to analyze data for bias and credibility we'll also explore what clean data means but wait there's more you'll also get up close and personal with databases we'll cover what they are and how analysts use them you'll even get to extract your own data from a database using a couple of tools that you're already familiar with spreadsheets and sql the key here is patience like anything worth doing this will take time and practice and i'll be with you every step of the way still with me great the last few things we'll cover are the basics of data organization and the process of protecting your data data works best when it's organized and if you're organizing your data you'll want to protect it too i'll show you how to do both and apply it to your own analysis i'm so excited to help you write your own personal story as you continue exploring the world of data analytics so let's do it right now data is being generated all around the world and we're talking tons of data every minute of every day millions of texts and hundreds of millions of emails are sent on top of that millions of online searches are made and videos viewed and those numbers are only growing that's a lot of data so let's learn more about how it's made and used in this video we'll talk about the ways that data can be generated and how industries collect data themselves every piece of information is data and all that data is usually generated as a result of our activity in the world these days we spend a lot of time online with social media and mobile devices millions and millions of people are adding to the huge amount of data out there each and every day think about it like this every digital photo online is one piece of data and every photo itself holds even more data from the number of pixels to the colors contained in each of those pixels but that's not the only way data is made we can also generate data by collecting information this kind of data generation and collection comes with a few more things to think about it needs to be done with consideration to ethics so that we maintain people's rights and privacy we'll learn more about that later on for now let's check out a real world example the united states census bureau uses forms to collect data about the country's population this data is used for a number of reasons like funding for schools hospitals and fire departments the bureau also collects information about things like u.s businesses creating their own data in the process the great thing about this is that others can then use the data for their own needs including analysis the annual business survey is used to figure out the needs of businesses and how to provide them with resources to help them succeed i actually generate data in the analytics i do for the healthcare industry we run a lot of surveys to learn how patients feel about certain things related to their health care for example one survey asked how patients feel about telemedicine versus in-person doctor visits the data we collected help the companies we work with improve the care that their patients receive survey data is just one example there's all kinds of data being generated all the time and there's lots of different ways to collect it even something as simple as an interview can help someone collect data imagine you're in a job interview to impress the hiring manager you'll want to share information about yourself the hiring manager collects that data and analyzes it to help them decide whether to hire you or not but it goes both ways you could also collect your own data about the company to help you decide if the company is a good fit for you or you can use the data you collect to come up with thoughtful questions to ask the interviewer scientists also generate data they use a lot of observations in their work for example they might collect data by studying animal behavior or looking at bacteria under a microscope earlier we talked about the forms that the u.s census bureau uses to collect data forms questionnaires and surveys are commonly used ways to collect and generate data one thing to note data that's generated online doesn't always happen directly have you ever wondered why some online ads seem to make really accurate suggestions or how some websites remember your preferences this is done using cookies which are small files stored on computers that contain information about users cookies can help inform advertisers about your personal interests and habits based on your online surfing without personally identifying you as a real world analyst you'll have all kinds of data right at your fingertips and lots of it too knowing how it's been generated can help add context to the data and knowing how to collect it can make the data analysis process more efficient coming up you'll learn how to decide what data to collect for your analysis so stay tuned [Music] we've talked a lot about all the data out there in the world but as a data analyst you'll need to decide what kind of data to collect and use for every project and with a nearly endless amount of data out there this can be quite a bit of a data dilemma but there's good news in this video you'll learn which factors to consider when collecting data usually you'll have a head start in figuring out the right data for the job because the data you need will be given to you or your business task or problem will narrow down your choices so let's start with a question like what's causing increased rush hour traffic in your city first you need to know how the data will be collected you might use observations of traffic patterns to count the number of cars on city streets during particular times you notice that cars are getting backed up on a specific street that brings us to data sources in our traffic example your observations would be first party data this is data collected by an individual or group using their own resources collecting first party data is typically the preferred method because you know exactly where it came from you might also have second party data which is data collected by a group directly from its audience and then sold so in our example if you aren't able to collect your own data you might buy it from an organization that's led traffic pattern studies in your city this data didn't start with you but it's still reliable because it came from a source that has experience with traffic analysis the same can't always be said about third-party data or data collected from outside sources who did not collect it directly this data might have come from a number of different sources before you investigated it so it might not be as reliable but that doesn't mean it can't be useful you'll just want to make sure you check it for accuracy bias and credibility actually no matter what kind of data you use it needs to be inspected for accuracy and trustworthiness we'll learn more about that process later for now just remember that the data you choose should apply to your needs and it must be approved for use as a data analyst it's your job to decide what data to use and that means choosing the data that can help you find answers and solve problems and not getting distracted by other data in our traffic example financial data probably wouldn't be that helpful but existing data about high volume traffic times would be okay now let's talk about how much data to collect in data analytics a population refers to all possible data values in a certain data set so if you're analyzing data about car traffic in a city your population would be all of the cars in that area but collecting data from the entire population can be pretty challenging that's why a sample can be useful a sample is a part of a population that is representative of the population you might collect a data sample about one spot in the city and analyze the traffic there or you might pull a random sample from all existing data in the population how you choose your sample will depend on your project as you collect data you'll also want to make sure you select the right data type for traffic data an appropriate data type could be the dates of traffic records stored in a date format the dates could help you figure what days of the week there is likely to be a high volume of traffic in the future we'll explore this topic in more detail soon finally you need to determine the time frame for data collection in our example if you needed an answer immediately you'd have to use historical data which is data that already exists but let's say you needed to track traffic patterns over a long period of time that might affect the other decisions you make during data collection and now you know more about the different data collection considerations you'll use as a data analyst and because of that you'll be able to find the right data when you start collecting it yourself there's still more to learn about data collection so stay tuned [Music] i don't know about you but when i'm choosing a movie to watch i sometimes get stuck between a couple of choices if i'm in the mood for excitement or suspense i might go for a thriller but if i need a good laugh i'll choose a comedy if i really can't decide between two movies i might even use some of my data analysis skills to compare and contrast them come to think of it there really needs to be more movies about data analysts i'd watch that but since we can't watch a movie about data at least not yet we'll do the next best thing watch data about movies we're going to take a look at the spreadsheet with movie data we know we can compare different movies and movie genres turns out you can do the same with data and data formats let's use our movie data spreadsheet to understand how that works we'll start with quantitative and qualitative data if we check out column a we'll find titles of the movies this is qualitative data because it can't be counted measured or easily expressed using numbers qualitative data is usually listed as a name category or description in our spreadsheet the movie titles and cast members are qualitative data next up is quantitative data which can be measured or counted and then expressed as a number this is data with a certain quantity amount or range in our spreadsheet here the last two columns show the movie's budget and box office revenue the data in these columns is listed in dollars which can be counted so we know that data is quantitative we can go even deeper into quantitative data and break it down into discrete or continuous data let's check out discrete data first this is data that's counted and has a limited number of values going back to our spreadsheet we'll find each movie's budget and box office returns in columns m and n these are both examples of discrete data they can be counted and have a limited number of values for example the amount of money a movie makes can only be represented with exactly two digits after the decimal to represent cents there can't be anything between one and two cents continuous data can be measured using a timer and its value can be shown as a decimal with several places so let's imagine a movie about data analysts that i'm definitely going to star in someday you could express that movie's runtime as 110.035 minutes you could even add fractional data after the decimal point if you needed to there's also nominal and ordinal data nominal data is a type of qualitative data that's categorized without a set order in other words this kind of data doesn't have a sequence here's a quick example let's say you're collecting data about movies you ask people if they've watched a given movie their responses would be in the form of nominal data they could respond yes no or not sure these choices don't have a particular order ordinal data on the other hand is a type of qualitative data with a set order or scale if you asked a group of people to rank a movie from one to five some might rank it as a 2 others of 4 and so on these rankings are in order of how much each person liked the movie now let's talk about internal data which is data that lives within a company's own systems for example if a movie studio had compiled all of the data in the spreadsheet using only their own collection methods then it would be their internal data the great thing about internal data is that it's usually more reliable and easier to collect but in this spreadsheet it's more likely that the movie studio had to use data owned or shared by other studios and sources because it includes movies they didn't make that means they'd be collecting external data external data is you guessed it data that lives and is generated outside of an organization external data becomes particularly valuable when your analysis depends on as many sources as possible a great thing about this data is that it's structured structured data is data that's organized in a certain format such as rows and columns spreadsheets and relational databases are two examples of software that can store data in a structured way you might remember our earlier exploration of structured thinking which helps you add a framework to a problem so that you can solve it in an organized and logical manner you can think of structured data in the same way having a framework for the data makes the data easily searchable and more analysis ready as a data analyst you'll work with a lot of structured data which will usually be in the form of a table spreadsheet or relational database but sometimes you'll come across unstructured data this is data that is not organized in any easily identifiable manner audio and video files are examples of unstructured data because there's no clear way to identify or organize their content unstructured data might have internal structure but the data doesn't fit neatly in rows and columns like structured data and there you have it hopefully you're now more familiar with data formats and how you might use them in your work and in just a bit you'll continue to explore structured data and learn even more about the data you'll use most often as an analyst coming soon to a screen near you earlier we compared some data formats including structured and unstructured data most of the data being generated right now is actually unstructured audio files video files emails photos and social media are all examples of unstructured data these can be harder to analyze in their unstructured format but here's the good news you'll be working with structured data most of the time for example if you need to analyze data about the unstructured data in emails photos and social media sites it'll most likely be structured for analysis before you even get to it because of that i want to explore structured data a bit more as a quick refresher structured data is data organized in a format like rows and columns but there's definitely more to it than that structured data works nicely within a data model which is a model that is used for organizing data elements and how they relate to one another what are data elements they're pieces of information such as people's names account numbers and addresses data models help to keep data consistent and provide a map of how data is organized this makes it easier for analysts and other stakeholders to make sense of their data and use it for business purposes in addition to working well within data models structured data is also useful for databases this makes it easy for analysts to enter query and analyze the data whenever they need to this also helps make data visualization pretty easy because structured data can be applied directly to charts graphs heat maps dashboards and most other visual representations of data all right so now we know that spreadsheets and databases that store data sets are widely used sources of structured data after you explore some other data structures you'll check out more data types using a spreadsheet the adventure continues [Music] by now you've learned a lot about data from generated data to collected data to data formats it's good to know as much as you can about the data you'll use for analysis in this video we'll talk about another way you can describe data the data type a data type is a specific kind of data attribute that tells what kind of value the data is in other words a data type tells you what kind of data you're working with data types can be different depending on the query language you're using for example sql allows for different data types depending on which database you're using for now though let's focus on the data types that you'll use in spreadsheets to help us out we'll use a spreadsheet that's already filled with data we'll call it worldwide interest in suites through google searches now a data type in a spreadsheet can be one of three things a number a text or string or a boolean you might find spreadsheet programs that classify them a bit differently or include other types but these value types cover just about any data you'll find in spreadsheets we'll look at all of these in just a bit looking at columns b d and f we find number data types each number represents the search interest for the terms cupcakes ice cream and candy for a specific week the closer a number is to 100 the more popular that search term was during that week 100 represents peak popularity keep in mind that in this case 100 is a relative value not the actual number of searches it represents the maximum number of searches during a certain time think of it like a percentage on a test all other searches are then also valued out of one hundred you might notice this in other data sets as well gold star for a hundred if you needed to you could change the numbers into percents or other formats like currency these are all examples of number data types in column h the data shows the most popular tree for each week based on the search data so as we'll find in cell h4 for the week beginning july 28 2019 the most popular treat was ice cream this is an example of a text data type or a string data type which is a sequence of characters and punctuation that contains textual information in this example that information would be the treats and people's names these can also include numbers like phone numbers or numbers and street addresses but these numbers wouldn't be used for calculations so in this case they're treated like text not numbers in columns c e and g it seems like we've got some text but the text here isn't a text or string data type instead it's a boolean data type a boolean data type is a data type with only two possible values true or false column c e and g show boolean data for whether the search interest for each week is at least 50 out of 100. here's how it works to get this data we've created a formula that calculates whether the search interest data in columns b d and f is 50 or greater in cell b4 the search interest is 14. so in cell c4 we find the word false because for this week of data the search interest is less than 50. so for each cell in column c e and g the only two possible values are true or false we could change the formula so other words appear in these cells instead but it's still boolean data you'll get a chance to read more about the boolean data type soon let's talk about a common issue that people encounter in spreadsheets mistaking data types with cell values for example in cell b57 we can create a formula to calculate data in other cells this will give us the average of the search interest in cupcakes across all weeks in the data set which is about 15. the formula works because we calculated using a number data type but if we tried it with a text or string data type like the data in column c we'd get an error error values usually happen if a mistake is made in entering the values in the cells so the more you know your data types and which ones to use the less errors you'll run into there you have it a data type for everyone and we're not done yet coming up we'll go deeper into the relationship between data types fields and values see you soon [Music] here's a riddle for you what do a music playlist a calendar agenda and an email inbox have in common i'll give you a hint it's not a weekly jam session the answer is they're all arranged in tables go ahead and check out your email inbox or a favorite playlist or look at your calendar agenda there's tables in every one a data table or tabular data has a very simple structure it's arranged in rows and columns you can call the rows records and the columns fields they basically mean the same thing but records and fields can be used for any kind of data table while rows and columns are usually reserved for spreadsheets when talking about structured databases people in data analytics usually go with records and fields sometimes a field can also refer to a single piece of data like the value in a cell in any case you'll hear both versions of these terms used throughout this program and your job let's go back to our playlist example we'll use the new terms we just introduced so each song is a record each record has the same fields as the other records in the same order in other words the playlist has the same information about each song each song characteristic like the title and the artist is a field each separate field has the same data type but different fields can have different types let me show you what i mean for the song list the song titles are text or string type while the song's length could be a number type if you're using it for calculations or it could be a date and time type the column for favorites is boolean since it has two possible values favorite or not favorite we can view spreadsheets in the same way the records in a spreadsheet might be about all sorts of things clients products invoices or really anything else each record has several fields which reveal more about the clients products or invoices the value in every cell contains a specific piece of data like the address of a client or the dollar amount of an invoice as a data analyst lots of data will come your way and records fields and values in data tables will help you navigate analysis understanding the structures of the tables you're working with is a part of that and hopefully while you're working hard on your analysis and those tables you can have a little fun with a different data table the one with your favorite playlist you probably use the words wide and long all the time you might use wide to describe the size of something from side to side like a wide river but a river can also travel great distances so you might call it long as well wait before you stop the video i promise you didn't accidentally click in the wrong course i'm not here to teach you words you already know but the words wide and long can be used to describe data too so i am here to help you understand why data and long data so far you've dealt with data arranged mostly in a wide format with wide data every data subject has a single row with multiple columns to hold the values of various attributes of the subject here's some wide data in a spreadsheet you might remember we discussed this data about the population of latin and caribbean countries earlier for this data set each row provides all of the population information about one country each column shows the population for a different year for example you'll find the annual population of argentina in row 2. y data lets you easily identify and quickly compare different columns in our example the data is arranged alphabetically by country so you can compare the annual populations of antigua and barbuda aruba and the bahamas by just checking out the values in each column the wide data format also makes it easy to find and compare the country's populations at different periods of time for example by sorting the data we discovered that brazil had the highest population of all countries in 2010 and the british virgin islands had the lowest population of all countries in 2013. okay now let's explore this data in a long format here the data is no longer organized into columns by year all the years are now in one column with each country like argentina appearing in multiple rows one for each year of data this is how long data usually looks long data is data in which each row is one time point per subject so each subject will have data in multiple rows our spreadsheet is formatted to show each year of population data here we see antigua and barbuda first long data is a great format for storing and organizing data when there's multiple variables for each subject at each time point that we want to observe with this long data format we can store and analyze all of this data using fewer columns plus if we added a new variable like the average age of a population we'd only need one more column if we'd use a y data format instead we would have needed 10 more columns one for each year the long data format keeps everything nice and compact if you're wondering which format you should use the simple answer is it depends sometimes you'll have to transform wide data into a long data format or other times vice versa you'll probably work with both formats in your job and you'll definitely revisit both formats again later in this program that reminds me earlier we defined data as a collection of facts as you've discovered over the last few videos that collection of facts can take on lots of different formats structures types and more learning about all of the ways that data can be presented will be a big help to you throughout the data analysis process the more you work with data in all its forms the quicker you'll start to recognize which data to use and when to use it and in just a bit you'll use all that data stored in your brain to help you take an assessment after that you'll learn how to identify and avoid bias and data and how to embrace credibility integrity and ethics the data adventure moves forward i'm so glad you're moving with it congratulations on finishing this video from the google data analytics certificate access the full experience including job search help and start to earn the official certificate by clicking the icon or the link in the description watch the next video in the course by clicking here and subscribe to our channel for more from upcoming google career certificates
Info
Channel: Google Career Certificates
Views: 1,544
Rating: 5 out of 5
Keywords: Grow with Google, Career Change, Tech jobs, Google Career Certificate, Google Career Certificates, Job skills, Coursera, Certification, Google, professional certificates, professional certificate program, Data analyst, Data analytics, Data analysis, Data analytics for beginners, What is data analytics, Sql, Data, R Programming, Spreadsheets, Spreadsheet, Types of data, Database, What is a Database, Database design
Id: NhwvA5Zqtio
Channel Id: undefined
Length: 29min 40sec (1780 seconds)
Published: Fri Jun 04 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.