Math you need as a Data Scientist

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

hello everyone and welcome to another episode of code emporium where we are going to talk about some mathematics that you need as a data scientist i currently have two years of full-time data science experience and i'm hoping that this video can help impart some knowledge onto how we actually use mathematics in real time real projects in the industry for data science so in this video we are going to be breaking down mathematical concepts into five major tiers i'll start by mentioning these five concepts and then for each concept we're going to go into details of like how we would actually use these with certain examples and also sub concepts before getting into the video though please do give this a like because the more you like the more that this video is spread to everybody in the world and the more they like the continuous process just goes on and so please be sure to give it a like subscribe if you like this content and also we do have a discord server so please do join because we're going to be talking about some amazing things over there and we would love to have you so with that out of the way let's get back to the video our first topic of mathematical interest is probability so under probability let me just mention a few concepts that we actually do use so starting with distributions so distributions gives us the probability of occurrence of every item that is of interest now these distributions of items can be in many forms the normal distribution could be used to model data with the central tendency the log normal distribution can be used to model data where we have a lot of samples which have a small values and then there's a small number of samples which have large values the uniform distribution can be used to model data where the probability of picking every item is the same and like this we can have so many more applications too so it's super important to just at least understand these distributions at a high level and then probably get into details of how they could potentially be related to each other as well the next topic in probability that i think is super important is maximum likelihood estimation the broad framework of mle tries to state that let's say that we're given a model right and we're also given some training data we want to estimate the parameters of the model such that the probability of seeing that training data is the highest and we do this with maximum likelihood estimation this is typically used during the training part of machine learning and is kind of behind the scenes where we're using maximum likelihood estimation to derive a certain loss function that we wish to optimize or in this case minimize and so if you've ever been interested in like how loss functions for certain machine learning algorithms actually appear then learning maximum likelihood estimation will help you better understand where they come from so it is a super important topic to know although we do not use it directly as much in your field of work the next topic that is of super interest is bayesian testing so bayesian testing is a framework for conducting a b tests while also combining the idea of well the bayes theorem so it's an implementation of a b testing where we have a certain assumption about how a system works we then gain some experimental data we try to conduct an experiment and then we update our beliefs of our prior knowledge in order to create our new understanding of the system we are now able to use this new understanding of the system to make certain decisions on whether or not for example to make certain product feature changes so it is super useful in the context of a b testing and i do encourage everybody to know about this coming up on concept number two we're gonna do statistics and for this we're going to segue like bayesian testing another very famous a b testing framework which is hypothesis testing so essentially hypothesis testing it is a framework of a b testing where we state a hypothesis gather some data and then try to prove how ridiculous that hypothesis is or is not and then based on that we would make certain product level decisions there are a bunch of statistical tests that come under this umbrella so for example we can use the t-test to have two samples and try to compare their means we can have the chi-squared test which is used as a test of independence or a goodness-of-fit test to see if two categorical variables are in any way independent or dependent of each other we also have some less strict tests which have less assumptions about the distribution of data and they can be used for example we have the man whitney u test or the signed rank test learning all of these types of statistical tests what they actually prove and don't prove is super important especially in just doing anything or testing anything in data science and so i super encourage you to learn a lot more about it the next topic of interest here is the central limit theorem so essentially the central limit theorem is more of a hidden concept within the confines of a b testing so it tries to relate the sample to a population the population is a group of interests and the sample is a subset of that group the core of what the central limit theorem states is that the distribution of the sample means is a normal distribution and it is typically a requirement in many of the statistical tests including the t-test and so understanding the central limit theorem will actually help you better understand where these two where the derivation of the t-test is coming from where we get a test statistic how we get p-values coming on our next concept we have the law of large numbers when i think of the law of large numbers i typically think about convergence so basically let's say that we have a dice right now what is the expected value of the die it is essentially going to be the sum of probability of getting each side times the value of that side itself and that expected value will turn out to be 3.5 so as you perform simulations of rolling that dice you roll it once twice a hundred a thousand times you will see that the average of all of those roles taken until that point will eventually converge to 3.5 and so with the law of large number states more technically is that the mean of performing a large number of trials is effectively going to converge to the expected value of whatever that system entails and so this i feel is super useful especially when we're carrying out monte carlo simulations in fact i've created a video on how i simulated a game of bingo with monte carlo simulations and inherently the law of large numbers does come into play because of convergence so please do check that out for more details now another concept of interest is confidence intervals so confidence intervals comes into play whenever we're making estimates of anything so for example we could be making estimates of the coefficients in a regression model or we can also make estimates about the output of the regression itself and how confident we are that it is close to a true value so for example let's take the first example of the coefficients so in this case when we make an estimate of a coefficient we can say that okay we can create a little confidence interval that you've probably seen if you've used something called stats models where we can say that we now predict this coefficient to be this value but there is like a 95 percent chance that it lies between the range a and b and so that range a between a and b is the confidence interval the more narrow that is the more higher confidence we have typically and also if they are of the same sign in the case of feature selection or in the case of like the stats model sheet that you get now this other second application that i mentioned which is the output of a regression model if we want to determine a confidence interval we want to determine an upper and lower bound and this is typically done with something called quantile regression that i've illustrated in pretty detail in another video and it just shows that okay we made this prediction right now but it can range between a and b with a certain like 95 chance that the true value lies between a and b and so this overall becomes the confidence interval and it's pretty useful just to get to know the uncertainty of certain estimations so very important concept to know and i do encourage you to learn it coming up on topic number three we have calculus calculus is the study of how things change a lot of what we use in calculus is not necessarily rooted in day-to-day work but it is required for understanding how certain for example loss functions are created so i mentioned before about maximum likelihood estimation and if we wanted to derive a loss function from the maximum likelihood initial statement we would need to use a lot of calculus in between and so i've put that in here as just a requirement to understand and to learn so although you're not going to be using it in day-to-day work calculus in your day-to-day work you still do require it for just understanding how how like loss functions are created or how optimization works and how things change for example and so i've mentioned it in this list coming up on topic number four we have algebra so algebra is a huge umbrella of just anything containing variables right but here there are so many subtopics that are of quite high importance for example matrices and vectors computers tend to process information in the form of matrices and vectors to take advantage of parallel processing and just make millions and millions of operations happen much faster this is how data is typically input to you know any model and also is processed by your machine learning models too and so understanding matrices and vectors and how matrix and vector calculations are performed like multiplications additions and even matrix calculus can be super useful another point of interest here is linear regression so in this case i put it as its own bullet point because i encourage everyone to take some data take some like two dimension set of points and try to actually fit a line through that data and try to also use mathematics to simplify what the equation of that line would be now this is kind of a very base fundamental of how you can expect other machine learning models to work although with other models it's going to be really hard to kind of derive on paper but with linear regression i think it's a great start another topic of interest here is graphs so graphs i mean like the visual cartesian coordinate system in this case we can use graphs for so many things from like data visualization to actually just visualizing how the loss decreases over time during training so in terms of like data visualization though some of the graphs you would need are like line charts box plots scatter plots choropleths which are like heat maps and so many more so understanding and having all of these data visualization tools in your arsenal would be great for you because it'll help you in just like visualizing any kind of data that's out there and also just making and explaining data into in the form of pictures it's just so much easier for stakeholders to understand even for you to understand internally so please do learn that as well coming up next we also have the valuation function so for example if we were to do some classification problems and to evaluate how wrong or right is our model we have certain metrics like precision recall the f score and for regression models you would do something like the mean absolute error or the mean squared error and so learning these metrics will actually help you determine performance of your models so it's super useful to learn coming up on the fifth and final topic of today's video and it is optimization algorithms so optimization algorithms essentially decide or they determine how we are learning a big prerequisite for understanding optimizations is functions themselves so functions can have maxima and minimum typically when we're learning in machine learning the loss function is at a certain arbitrary point and then we have to figure out a way to minimize that loss as much as possible in order to find a global minima so understanding functions understanding the concept of maxima and minima is all intertwined in this understanding of optimization algorithms in general a good first start to this is understanding how gradient descent works and from there you can try to you know just tweak this a little bit in order to make it stochastic gradient descent then mini batch gradient descent at a grad auto delta rms prop adam adam there's so many out there that you could probably just continuously tweak these strategies of learning and just get these new and potentially even better strategies depending on your problem i made a video on optimization algorithms that you can check out and essentially it describes everything that i just mentioned and also says like what the difference between each of these optimization algorithms is and so i do encourage you to check that out and that's all i have for you today so just note that all of the five mathematical concepts that i mentioned is definitely non-exhaustive and probability statistics these are used in so many more places that i could i could just go on and on honestly and i think you've noticed that in this video there's a lot of overlap between some of the concepts and also where they're used but i hope this still gave you a good idea and a good starting point of where you can start trying to focus your time depending on what you want to learn in mathematics so that you can become a data scientist really soon or maybe just continue to be a data scientist if you already are one all this is super fun and i hope you get started learning thank you all so much for watching today and if you liked the video please do drop that like please subscribe ring that bell and also join us on discord because we're gonna create an amazing community with you and until next time i'll see you soon bye [Music] you

Info

Channel: CodeEmporium

Views: 1,849

Rating: undefined out of 5

Keywords: Machine Learning, Deep Learning, Data Science, Artificial Intelligence, Neural Network

Id: Q51FH0-cIZw

Channel Id: undefined

Length: 15min 25sec (925 seconds)

Published: Mon Oct 25 2021