Google Data Analytics Certificate Course 4 of 8 - Process Data from Dirty to Clean

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
process data from dirty to clean this is course number four of eight in the google data analytics professional certificate program i'm gonna review it right now just kind of analyze stuff hey everybody matt bratton here with tmbanalytics.com your analytics career headquarters we are reviewing course number four of eight in the google data analytics professional certificate program it is called process data from dirty to clean and like i started off the last video i also gave this one a 5 out of 5 stars i thought this was a really good course even with something as silly as processing dirty data i think the fact that it hit so close to home anybody who does analytics knows look dirty data is life in fact i should get a shirt that says that but seriously it's a it's a thing and being able to clean dirty data recognize dirty data deal with all of these things it is imperative for the court for life in this career so let's go ahead and jump right in i'm going to do the quick review you can see here i've completed the course this is a six week course so uh it's another one on the longer end as far as some of the courses they've been as short as four weeks but this one is six weeks and we are going to jump right into week number one so what this is talking about is before you clean check for integrity very good advice there so let's take a look at what this is talking about here so i'm i guess i didn't i don't know what's going on because it's not marking uh all the stuff that i've done because i've definitely completed the course i've definitely gone through and done all of this stuff so either i got lazy and stopped marking everything or i don't know what else happened but trust me i've done it i you just saw that i've completed it so again the very beginning of any of these courses it's going to go through the syllabus it's going to give you a little quick refresher on where you are in the roadmap and where you're headed what you've done so far remind you of why you're doing this all that good stuff and then we jump right into some of the content so maintaining data integrity uh why data integrity is important i mean just saying that you kind of have a sense for of course data integrity is important if if your data has no integrity then what are you doing with it not much probably because everybody knows the same garbage in garbage out so this is all about making sure that you are not putting garbage into the big analytical machine and ensuring that what you're putting in is actually quality right so it gets into compliance it talks about scenarios for dates and giving common situations where certain data inconsistencies might cause issues for you i love that it's got this quick reference guide for data constraints and gives you different examples of things uh to take into consideration it talks about balancing your objectives uh as an organization with the integrity of the data making sure that you're well aligned with your objectives and uh this is all it's it's important to consider because you got to make sure that the data that you have is going to allow this is like the analysis before the analysis right as you this is a maybe a quick tip for anybody in the analytical spaces when you're getting questions at least this is how my brain functions is when i'm being asked a question or an analytical objective i don't just think from the perspective of the question being asked i think about the data that's available so i'm processing two different lines of thinking while somebody's asking a question it's number one do i understand do i comprehend the question being asked and number two can i address it with the data that's available are there issues with the data that's available that are known that need to be overcome so often when i'm having these interactions with stakeholders talking about requests i will bring up some of the limitations of the data while we're discussing and offer alternatives and say look you know i understand this is what you're looking for this is what we do have these are my concerns with that but what we what we have with more confidence is this and here's how i think that might be able to help a lot of this is kind of in a roundabout way talking about that sort of thing so how to how to make sure that you're coming up with accurate conclusions and dealing with all that it gets into sample sizes and data actual data collection what to do when you're dealing with insufficient data how to know when you can move forward why various sample sizes or excuse me are important for different types of of uh business problems all sorts of of really good tidbits here uh of best practices that you should be following as you're you're working through a project it also gives you options for testing data or what to do when there is no data and what are good proxies for data that you wish you had it talks about how you could use historical data in certain circumstances versus data that's readily available when should you actually start collecting data versus holding back and going another route depending on the urgency of the request being made so all really good things to consider lots of optional exercises within this talks about margin of error and why that's important to understand sample sizes a lot of it it's like statistic type principles that it gets into and anybody who's taken a statistics course probably already had your ears ringing when i'm talking about sample sizes sample collection data margin of error all that kind of fun stuff so that's week number one just kind of prepping you for for what's to come so then we're going to go into week number two all about clean data and here it gets into what is dirty data you know it gives you examples of it's duplicate or outdated or incomplete or inconsistent or incorrect right obviously you need to be familiar with your data in order to even answer these questions right this isn't just something that you assume that something is certain data is a way that it is you don't just assume that your data is good you also don't assume that it's incomplete right you have to be familiar with your data and so it gives you ideas of ways to identify this different different situations descriptions possible outcomes and what harm it could cause if you're to use that sort of thing so this is going deeper on the whole garbage and garbage out idea that hey if your data is this type of garbage then this is the type of garbage that you're going to be delivering to the business is that really what you want right so it gets into a lot of that it goes into some data cleaning techniques which is also super helpful good to know common common data cleaning uh pitfalls it talks about like documenting what your steps are so documentation i might be getting ahead of myself but it does get into the importance of documentation uh learning yeah you've got like a log how to how to keep a catalog of not only like the sources and uses of your data but the way that you are treating your data as you're preparing it for usage right all really important stuff here see if there's anything else here that might be worth showing off different data perspectives this is just a video but workflow automation this was also kind of helpful because it makes you think about different uh different ways that you're you're dealing with the data handling modeling cleaning presenting can you automate it yes no or otherwise should you automate it yes no or otherwise gets into the whys so this is also sort of prepping prepping you mentally for what's to come later in this course so week number two was good good for setting the table here week number three cleaning data in sql so now we're we're going back to sql and we're talking about different things that you can do uh understanding the capabilities of sql using sql as a junior data analyst so it talks about different tasks and things that you might be asked to do how to count certain things identify duplicates using sql it also talks about why you should be why you should or probably are using spreadsheets in one instance and what the benefits of using something like sql might be for that obviously the larger the data set the more uh benefit you'll get from using something like sql that can parse through data much more quickly and then a spreadsheet and it's just generally going to be more reliable because you're giving it specific commands and as long as you know what you're doing when you're giving those commands then you're going to feel more confident in the data that is coming out of it different sql dialects it talks about you know what sql actually is you know structured query language and talks about mysql postgres uh all this kind of fun stuff so this is uh this is good because these are also common questions that people have it's like hey i learned my sequel or t-sql or whatever is that how how hard is it gonna be to learn postgres and things like that so it kind of gets into the the all of the details behind that and what it what sql actually is and how those different um iterations exist and why so that's it's helpful learning there we talk about widely used queries given some a lot of video content here practice quiz transforming data uh debugging sql code this is a kind of a fun discussion prompt i obviously didn't do anything here but again now we're reintroducing getting involved not nearly as pushy as it was early on in the the the different certificate program courses but they're they're getting more um i think they're prefacing the idea that they're they're hoping that people will start to have more technical engaging discussions in the forums and then it gets into the practice quizzes and all that fun stuff so um weekly challenge and i've obviously done all of these things because again i passed the test uh what did i just do cleaning data yeah so view and report your cleaning results so here i guess i wasn't as laid lazy here and actually got some stuff i still think this is super weird that it's not showing my progress because i definitely went through all this stuff verifying reports final step in data cleaning data cleaning verification checklist so this one was good i again i'm a big i'm learning through this process that i'm a big fan of the the reading materials like the actual tangible um lists of things that i can skim and look at videos it's like you're you're on you're on the roller coaster and you've got to be along for the ride even at 2x speed i get impatient because there's specific information that i'm being told that i'm gonna get and uh i guess maybe at this point because a lot of this is refresher i'm i'm personally less motivated to watch a video but i love reading the readings i think that there's a lot of it's like very very action-packed very dense reading material that i really just enjoy uh taking all the stuff it's like a pile of cheat sheets really so um this one it just talks about the the goal of your project making sure that you understand the business problem confirm the goal verify the data is there and yada yada so lots of good stuff here some more practice quizzes capturing and cleaning changes this is where we get into change logs it shows you how to you know go into different spreadsheets or particularly google sheets review change change history so you can go back to it it even shows you how to do it in microsoft excel in bigquery you can look at history iterations things like that all very helpful and then it also talks about you know creating your own change logs making sure that you've got version control history important things to keep in mind here so that everybody understands what you've done while you've done it this gets into some advanced functions about different syntax changes between excel and google sheets different functions what they do uh pulling data from multiple sources and all that jazz all all good things to know and then you got your weekly challenge i if it sounds like i'm being somewhat repetitive uh it's because that's kind of how i review the stuff it just it it kind of goes through the motions i like to focus on the reading i will highlight the videos and then you talk about the quizzes and some of the the more hands-on stuff so week number five was optional uh was week number six is the course challenge but um week number five was optional but adding data to your resume so you notice in the if you watched or if you haven't watched but course number three i talked about at the end that there was an optional uh final section that gets into you know your linkedin profile building your personal brand all of that so this one here the optional course is talking all about um you know your resume and getting all this stuff so this is how it's it's slowly introducing also the the the professional angle of this certificate program we're supposed to help you to prepare yourself for getting a job or becoming more employable in this particular space so they're they're adding in these optional modules so that you're you're looking at your linkedin and you're getting it up to date you're learning how to network and send messages in this course here they're talking about how to get your resume updated and making sure that you understand what a data analyst resume might look like and they've got very good video content here going through all of those things um careercon resources on youtube so it talks about different resources and areas where you can go watch different videos on youtube making sure that those are available making sure that you're highlighting your experiences again all this feedback is is solid if you actually go through this step by step they're giving you really good information really good insights into things that you should take into consideration when you're you're getting your profile built out as an analyst all right exploring your areas of interest where does your interest lie share resume best practices so then it asks you to go into the forums and start talking about some of the stuff and then last is week six where it's just the course challenge you can see here uh just basically getting ready for it and the video and then your the quiz so that was it the week six uh it was i don't know that i would actually call this a six week course in fact because the first four weeks was all the content and then week five was your optional thing on the resume and then week six is just the challenge so eh it's kinda i i don't necessarily feel like i was um shorted anything in particular uh just an observation just just making an observation there of of what that what that's all about so anyway uh that was it that was the processing data from dirty to clean course number four of the google data analytics professional certificate program very important topics very good coverage of those important topics good exercises the quizzes were challenging and very very thoughtful as the way that they were set up so uh i gave it again five stars i thought it's important uh very valuable information here so that's all i got and if you have any questions comments go ahead and throw them down in the comments below and i'd love to hear from you guys and that's all for now thanks for watching
Info
Channel: Matt Brattin
Views: 1,179
Rating: undefined out of 5
Keywords: Google Data Analytics Certificate Course 4 of 8 – Process Data from Dirty to Clean, google data analytics certificate, data analytics certificates, data analytics certification, google data analytics, data analytics professional certification, data analyst, data analytics, data analysis, google data analyst, google career certificates, google data analyst course, google certificate data analyst, google certification, google certificate, google data, career change
Id: KCWQ0YO6Enw
Channel Id: undefined
Length: 16min 10sec (970 seconds)
Published: Sun May 02 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.