A/B Testing for Data Science ( Python and R) | June 30th, 2021

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

now yes your whole screen yes okay do you see you see this blue screen that says about the school we see your desktop let me okay this is actually the first time i've i'm trying uh youtube live streaming so thank you for thank you for your patience how about now yes this is great okay great and you can see me you can see both the presentation and me okay great yes [Laughter] um i play second fiddle to none even a presentation all right so um all right so uh on the third bullet point i was just talking about how our bootcamp is the only one that offers both python and r um over the course of the bootcamp you will be asked to complete four industry projects that demonstrate comprehension of the modules and the concepts including the capstone project that will tie everything together and depending on the cohort there also is an opportunity for you to work on project pitches from our industry partners so that is a great opportunity for you to increase your visibility in the data science community we have very strong job placement support in the form of mentors there are one-on-one mentoring sessions where in addition to homework and project help you also have uh career help in that a lot of our mentors are alumni who graduated from the academy and you can get their take on how they landed a job the work culture at a particular company that you're interested in working at or anything that might be relevant to your job search in addition to that our career services uh support offers mock interviews resume reviews we keep close contact with our alumni and we invite them back to alumni networking events alumni workshops coding workshops to keep their skills sharp you are added to our job portal upon graduation so that you will be in a select list of people our industry partners have visible who know that you graduated from the boot camp and who have survived the rigors of the boot camp and you have lifetime access to this so after gradua after you land a job if you want to after a few years at that job want to come back and are curious in another in another sector or industry you can toggle your visibility in this job portal back to on and continue searching in another sector with your added experience so we want to keep our network growing and we value the our network so that you will find a lot of added value having graduated from the new york data science academy long after graduation we have 2 000 alumni working across the globe and our diversity is reflected in our faculty our student body and our industry partners with division offices all over the world and we're also proud to say that we have been consistently highly ranked in both switch up and course report i'm going to step into this meeting there are two delivery formats for our bootcamp one is in-person slash live streaming and that one would be you have about a two to three hour lecture in the morning after which you break for lunch for about an hour an hour 15 minutes and then you return to afternoon lecture where that is also two to three hours the rest of the afternoon you have available to ask instructors questions um collaborate with your peers on projects and homework this one is 12 weeks and the next cohort begins july 6th if you missed this window you have another opportunity coming up in september 27th you may apply for that one the second option is the interactive distance learning which is our online option where lectures are pre-recorded and aside from that the level of interactivity with instructors is the same the amount of career support is the same the content is identical it just provides a bit more flexibility in terms of timing of the lectures you can watch them after work or on a schedule that better suits you and you get for the full-time program an extra four weeks to complete your work so instead of the 12 weeks you get for live learning you get 16 weeks for full time or if you're a full-time student or full-time professional you get 24 weeks to complete it there is also the data analytics bootcamp that i mentioned earlier this one is also interactive distance learning part-time 12 weeks so you have several options to choose from um and the start dates for these are july 6th and august 16th for the interactive distance learning pre-recorded lecture option and july 6 for data analytics if what i say interests you feel free to go ahead and submit an application the application process is very simple and quick you submit an application you then schedule an interview after which someone from the admissions team either myself or noah escobar will interview you to assess um your goals for the program what interests you in data science and your fit for our program if we find you a good fit we'll then send you a technical assessment that you'll complete within 48 hours it will be forwarded to one of our data science instructors who will grade it and provide feedback as far as the areas suggested for review before the beginning of the boot camp if you apply for the august cohort we are running um a 10 discount you'll be saving almost 2 000 on your tuition if you decide to apply for our august interactive distance learning cohort so that's something to consider that's quite a bit of quite a bit of cash you would be saving if you have any more questions please feel free to email me at this email address and i'll also provide you my my personal email i'll drop that in the chat and let me figure this out stop share and i will turn the oh if there aren't any questions are there any questions i'm sorry i wasn't paying attention to the chat box um let's see i don't think so if you have any questions for me specifically go ahead and email me and i'll turn the presentation over to carlos now hi uh can you hear me i can hear you right um sophia i believe you you may have shared your email just with myself so on the zoom you may want to share with all the participants because i i think it went just to me your email oh yeah zoom is so clunky i i i'm not taking any fault for this i'm blaming you that's all right okay um okay let's just try sharing again with everyone great so everyone has sofia's email now and as i start could could you actually just go and mute people as the host you are able to mute everyone so we don't have background noise um all right thank you everyone for joining today i'll start sharing my screen and we'll start with today's webinar all right uh can you see my screen now you can give me a thumbs up in your camera or in the zoom on the chat that's great thank you great all right i still hear some background noise so if you could mute yourself uh otherwise sophia will go around and mute your mic so that everyone has a uh is muted for for today uh the interaction uh today we i just noticed we have quite a lot of people here normally during our classes we have small groups of people and everyone is with the mic on and everyone talks and it's a conversation however today we have a lot of people it's impossible for everyone every everyone on with the open mic because as you saw some people have no background noise and so on so i encourage you to use the chat as the medium of communication and um i'll be asking some questions then i encourage you to try to answer those questions at least in your head to make this more interactive and you can also share your answers on the chat and then at the end we'll have a proper q a session so you can also hold on your questions for the end all right uh so my name is carl zafons i am a data science instructor um at the new york city science academy and it's my pleasure to introduce you to this topic today a b testing for data science and we'll be using some coding in python nr so this is actually an introduction introductory webinar so that you'll learn some important statistical topics uh that are involved in ev testing and also so you get an idea of how it is like to code in python nr here's the outline uh for uh today and actually i'm going to increase this so you can see it bigger again if you could mute yourself or sophia could mute you that would be great so don't have the background noise please okay i'm having trouble administ uh administratively muting people i mute them and then they come back on so they would have to do them themselves i tried to do that earlier with the guy blaring his music and it just kept blaring okay so if someone keeps uh being uh too noisy should i should i remove them removed from the zoom i'm sorry uh did you all hear that yeah if you are too disruptive we'll have to remove you i'm sorry about that uh so that forever for the benefit of everyone else okay all right so uh the outline for today uh we're going to talk about a b testing an introduction to a b testing what it is why do we use a b testing we'll talk about some general examples and some specific examples and then we'll explain how a b testings are done and in particular we'll go in detail about the hypothesis testing part and what is the concepts involved in hypothesis testing such as the p-value which is a concept that is often difficult to understand and misunderstood so we'll have a good explanation of what the value is we'll actually be using what's called the permutation to do the hypothesis testing part of every testing so that's another thing that you'll be learning today what's the permutation test and i'll explain what it is and why we use it and how it's done again we'll explain step by step how it's done and uh after that after kind of the introduction and explaining the general concepts where we'll use some uh um visualizations to explain those concepts we'll have a coding session where we'll call these concepts from scratch both in r and python and at the end if you have extra time i'll provide with a more general solution and i'll even show you a shiny app where you can manipulate all of these things and of course at the end we'll have a q a session so the learning objectives for today are in separating two important parts statistics and coding in terms of statistics you'll learn what are a b tests what's hypothesis testing what are p values and permutation tests and other concepts related to hypothesis testing regarding coding you'll learn about variables loops and functions okay so okay uh i will [Music] so apparently you can see some gray on the bottom is that it's better now okay so it's good all right so let's start uh you are here because you're interested in data science and maybe particularly in a b testing so i'll ask you what's an a b test uh you can try to answer that in your head or you can share uh the solution or your answer on the chat so you can interact with everyone else um so what what's an a b test uh have you used a b test have you read or started learning about a b test what you think is an a b test i'll give you a moment to think about that and to provide an answer so let's see if anyone shared a test between two alternative paths to determine which accomplishes the goal better that's a great answer thank you george for sharing yes in simple terms an a b test is an experiment to compare two computing options and we call them a and b those options could be for example two different treatments in a medical context two different designs in terms of say two different adverts or two different versions of the same advert to see which one would be better uh two different web designs two different products or two different versions of the sim product having the same product trying to sell at the same price to see what in the end gives you better profit and so on so it could be any two different options for a problem that you are trying to improve okay and why do we use an ap test well we want to determine if those two options do those two competing options a and b are different and different how different in a statistical sense and that's where we'll need to use hypothesis testing and today we'll be using a permutation test to do the hypothesis testing part normally we actually don't want to just know if a and b are different normally you want to know which of them is better if one of them is better than the other why because you are trying to kind of solve a problem and you are trying to find which of these two options is better for solving the problem again the better depends on that what type of question or type of problem are you trying to solve and it's divided in the sense that it will be the better one at solving that question or achieving the goal at hand for example uh what what should what of the two options would be better say for increasing uh the number of customers for your company or for increasing the profit of your company those are two different goals that may require different options to achieve them some general examples uh for example uh we can have two different soil treatments to try to understand uh which of them may promote better seed termination the statistical concepts have a historical root in agriculture and uh this uh still still to the to uh today uh it's still used a lot in uh improving crops and so on uh in agriculture more modern examples two app had two web headlines say of a news article or of a blog post to understand which of them generates more clicks or two web designs of web pages landing pages to understand which of them may lead to more conversions maybe they are trying you are trying to sell your product online and you have different web designs to see which one of them lead to more sales or more conversions and or for the same product two different prices to see which of them could yield a higher net profit or on the other hand which of them could lead to uh more new customers those are two different goals and depending on the state of your business you may want you may be in a stage where you are more interested in acquiring more customers and you may be using different options to do that or maybe are your established business and just want to increase your net profit or again coming back to the medical examples two different therapies to understand which of them may be more effective at super suppressing cancer and normally within the medical context those two options a and b are often called the control group and the treatment group or the control option and the treatment option the control group is normally the group that is exposed to no treatment or a standard treatment the current treatment and the treatment group is the one that is exposed to the new treatment the one we think is going to be better than no treatment at all or the existing standard treatment uh moving forward for today we'll be just using the a and b uh nomenclature will not be using controlling and and treatment group but you may hear that dementia uh when it relates to um a b testing and uh hypothesis testing right so those are general examples so some more specific examples there's one famous example from microsoft the bing search engine that one single a b test uh about changing the way the bing search engine displayed headlines led to a 12 increase in revenue and that was equivalent to more than 100 million dollars per year in the us alone so that that's a very famous example of where one a b test led to a huge profit and actually the story is that a b test had been uh proposed uh for someone and it was kind of um people do not did not act on it and then someone decided oh this may be a good a b test and they they did it and then they realized it was a very profitable one an example from amazon and this one you may be familiar uh if you use amazon and nowadays you'll see uh practically all the time amazon offering you a credit card when you go check out your shopping cart and that didn't happen by accident in the past they didn't have that option they did they be testing and experiments and they realized that moving the credit card offers from their own page where in the past they were to the shopping cart page uh boosted the profits by tens of millions of dollars annually okay so another example of a b testing increasing um uh having a huge impact in a business however it's not always per it's not always perfect and it did not always works most of the times doesn't work that well and data from both google and bing showed that only about 10 to 20 of experiments actually generate positive results in the sense they actually are implemented in business and translates to improvement in business and all of these stories you can read them in this article and the surprise power of online experiments uh from publishing the harvard business review in 2017. you'll have all these slides uh after the after today we will send all the materials to you by email so you'll have all of this and these are links where you can click and you'll be directed to that source all right so av tests ever can be very powerful uh but how are they done okay um actually that's a good question maybe some of you already have an idea of what rb tests let's just check how everyone what everybody's understand of what how a navy test is done so if you could just think you can answer the question on your head write write write by yourself in by paying a paper or try to answer in the chat share with us what you think are the steps of an a b test i'll give you a moment for you to think what are the steps of an a b test just the general steps you know they they should fit in your hands like up to five steps would be fine or whichever way you understand every test all right ipods test great all right so i'm going to give you my um my answer and here we go so this is how i break down the process of doing an a b test and we are going to focus more on the operational ones the steps one two four to four okay we're going to go to cover these ones in detail uh but before you you start an a b test you actually have to uh define the b test so there's a kind of caller step zero where you come up with some idea for an a b test and you define db test what's your question what's your goal what data do you have access what subjects will be using for your b test what are the options that you're going to be testing and what's the test statistic the thing that you're going to be measuring to compare the two options to see if they are actually different or not okay so that's a very important step that the definition step where you define what what's db test that you want to do what you're going to focus more today is the operation operational part of it once you know what what you want to do how it's how is it done okay so you need to have a set of subjects that you can use for your a b test to identify a set of subjects that you'll use for db test the whole set of subjects then that's step one step two is to randomly assign those subjects to the two separate groups that we call a and b and that's it's very important that you assign those subjects to the groups randomly step three is to expose each of the groups to uh the different treatments or the different options a and b and measure the results uh how do they perform in terms of your test statistics okay and then step four is the one of actually performing the hypothesis test to determine if the observed difference is statistically significant or not and this one for today we'll be doing with the permutation test we'll be showing it uh how to um do hypothesis testing with permutation test today and finally uh kind of a step five that we are not going to be discussing much today as well is taking action or making a decision and potentially taking action based on the test results if the test results are good enough you'll decide on implementing that change in your business and actually take the action of of making that change and uh no get the benefits of that test so that's kind of uh an overview of uh the steps and i will use this um animation to kind of illustrate these these steps so first we need to have a pool of subjects that we identify to use in our arabic test and in this case is this pool of 20 dots each dot represents a person that's our step one where we have all the subjects that we selected to use in our eb test step two is the step two here we're going to randomly assign these 20 subjects to two separate groups the group on the top and the group on the bottom each of them is going to have 10 people each so let's randomly assign them and you see there was randomization in the way the bullets were assigned to the two different groups now we have two different groups a and b each of them in this case with 10 people each and the step three is to actually subject those people or those subjects in the groups to the treatment the treatment a here presented with the purple color and the treatment be here represented with the yellow color and see what happens will they convert or not so in this case we're just talking about yes or no is the person going to click on your ad going to buy your product or no so we're just talking about conversion rates and we'll just be talking about the s conversion rate and that's when we subject them to the different treatments and then we see that some people the red ones did not convert and the green ones did convert and now what you have to do is go to each group and count within each group how many of them did convert in this case for group a if you count will have four six seven greens so we have seven out of ten that's a 70 conversion rate and for bree we had one two three four four greens four conversions and that's four out of uh ten that's a forty percent conversion rate so we did our a b test where we split uh a total set of 20 subjects into two groups a and b and we send 10 people to each of the groups so the group a at 10 people a group b 10 people and then we subjected each of the groups to the different options the b the b and the a and for b we had a 40 conversion rate four people out of the 10 converted like they bought our product clicked or add and so on and for group a we had a 70 conversion rate seven people out of the 10 uh converted okay now what we are interested is the difference between the two options in this case difference between a and b is going to be 70 minus 40 and that is 30 okay so that's the end of step three uh we measured the test statistic and now we see there's a difference of 30 percent between option a and option b now the question is step four is that difference uh statistically significant or not and to do that we need to do hypothesis testing and we'll be doing it with a permutation test so that's what's coming next now all right so here's the summary table of the this little experiment that we just conducted and we are using this uh small experiment with small numbers so that we can see everything we can see uh all the subjects moving around through the process and we can compute everything from scratch and we'll you'll be able to see all the codes working and all the solutions happening uh real time okay normally an a b test does not use these small numbers you need larger numbers okay we're just using these small numbers so that we can compute everything and and understand everything very easily so we have uh our a b test we have 10 people in option a and 10 people in option b we subjected them to the uh the different options and we noticed that from option a we had seven yeses and uh three nodes that's a total of ten and for option b we had four yeses and six nodes so the conversion rate yes rate for a was seventy percent which is seven over ten and we are we'll just keep talking about percentages this is zero point seven over ten is zero point seven but in percent is seventy percent and the conversion rate yes rate for the group b was four in 10 that is 40 okay now uh our test statistic the thing that we are using to measure if there's a real difference or not is the difference in the s rate and the difference is we are defining it as a minus b so 70 minus 40 and 70 minus 40 30 okay so this is our observed difference we did the a b test experiment we we selected the subjects we assigned the subjects randomly we subjected them to different options a and b and we measured for each group what was the conversion rate and for a was 70 percent for b was 40 the difference between a and b is 30 now the question is is this difference uh statistically significant uh or not okay and basically what you want to understand well to answer this question is uh is this difference that we see here of 30 percent um due to a real difference between the options a and b or could it just be due to random chance alone okay and that's what hypothesis testing is going to do for us and here's another um kind of view of how db test works uh in the perspective of a subject that's participating in a b test so this subject one this person one was randomly assigned to group a and then when subjected to the option a it actually converted so this was someone that say bought our product or clicked in or out and but for example person two this person two was also randomly assigned to group a and did not convert was a red one and person three was randomly assigned to group b and it converted and so on and so forth so so you get another view of how an a b test works people are randomly assigned to groups and then within each group they are exposed to the option the corresponding option the a and b and we measure if they convert or do not convert and then in the end we compute we calculate how many of them actually converted and we are computing the difference of the conversion rates between group a and b and that's our test statistic the difference of conversion rates between a and b okay so now we get to the step four which is the hypothesis testing and again in hypothesis testing uh we are trying to uh understand if the observed difference in this case the 30 difference between the two groups is statistically significant or not now thanks to the step two which was randomization the fact that we assigned the people uh to different groups randomly we can say that uh any difference that we see between a and b can only be due to one of two options due to random chance uh it just so happened that by run by the or randomization uh put everyone that will say would naturally or for whatever reason always say yes in one group and then the people that would always say no in another group so by random chance you can see some we can get some difference that is just due to random chance just uh by the randomization that we use to assign the subject to different groups and that's the null hypothesis or if it's not due to random chance uh it could it's uh the alternative is that it is due to a real difference between uh the options a and b so there's something actually important in the significant and important difference between groups a and b the options a and b that make it so that people in one group are more likely to convert than people in the other group in the example in the medical example between two drugs say or two different treatments maybe one treatment is actually much better than the other at treating a certain disease okay and uh the idea that uh the difference we observed just due to random chance is the null hypothesis and the uh the other hypothesis is called the alternative hypothesis that it's actually due to some real difference okay so thanks to the fact that we use randomization the assignment of of people to different groups now we can test that any difference we observed is either due to random sense or to a real difference between the two options and the hypothesis that the difference we observe is just due to random chance it's the null hypothesis and the hypothesis that the difference that we observe is due to real difference is the alternative hypothesis and the hypothesis testing is a way of testing which of these may be actually uh more technically what we'll be doing in hypothesis testing uh will be uh assuming that the random chance the null hypothesis uh is true and we'll be measuring um a statistic that will tell us if we should keep assuming that null hypothesis is true or we should reject the null hypothesis okay so during positive testing what we are trying to answer is the question is to whether random chance the null hypothesis could be uh the null hypothelon the random chance alone could be a reasonable explanation for the observed difference that we are seeing in our experiment so during the hypothesis testing process we will assume that my hypothesis through will create the model that corresponds to the null hypothesis which is a probability model that we are representing here uh in this graph and i'll explain in a moment and we'll test whether the observed difference that we see in our experiment it would be a reasonable outcome uh under the null model that it would be just due to random chance okay or in other words is the observed difference uh within the random variability of the null model and well we're going to do this step by step and we're going to use codes to understand these concepts better okay so in hypothesis testing we are trying to understand uh if the null hypothesis is true or not and the null hypothesis is normally that there is no difference between the groups that a and b are the same so the null hypothesis is that a is equal to b the option a is equal to the option b there's no real difference between the two of them the alternative hypothesis is that no there is a difference between a and b a and b are actually different and the difference we are observed we are observing is due to difference in a and b so normally when we talk about null hypothesis and hypothesis testing we are talking about this formulation of either a is the same as b or a is different than b and this is what's called a two-way test or two-way hypothesis test there is an another version which is the one-way test the one-way airports test where we are not testing whether the options are the same or different we are testing whether one is better than the other so if we are trying to test if say a is better than b that that is our alternative hypothesis the null hypothesis has to be the complementary so the null hypothesis that a is worst or the same as b so a is not better than b is the null hypothesis in our one-way test and the alternative hypothesis in our one-way test is that a is better than b okay in our experiment we observed that uh a was better than 30 percent so we can also test uh the one-way hypothesis test if it's really a significant difference between a and b or not if a is really better than b or not and we'll be doing we'll do the two versions we'll do the two-way hypothesis test and we'll do the one-way hypothesis test all right i will i will explain this uh graph later but what this graph represents this is the density plot and the density line this line represents the null hypothesis the expected output of an experiment like the one we did comparing those two groups a and b uh just due to random chance what differences would you expect uh to see uh between um the options the groups a and b if just due to random chance just by assigning the subjects randomly multiple times and what you'll see here in the red line the vertical red line is the observed value of 30 percent this is 30 percent here and on the on the other hand what you see is the minus 30 percent and on orange you'll see and you'll understand that this orange area is actually the p-value the orange area represents all the values that we could get just due to random chance that are larger more extreme than the one we observed so if here's thirty percent that one that we observed all of these values are larger than thirty percent they are more extreme that one will observe and those are the values that count for computing the p value and uh similarly all the ones that are smaller than minus 30 percent would also be more extreme for the two-way ipod the two-way test okay so we'll see and we'll understand better that in the two-way test we are computing the area of these values here the orange values on the left and on the right and those two areas together are the p-value for the one-way hypothesis test we'll be computing this area only on the right the orange area on the right and that area will be the p-value and the overall area under this curve is one that's the overall probability and then these orange areas are the probability of obtaining values that are as extreme or more extreme than one we observed and we'll do that step by step and we'll understand better what it means okay all right so um a little bit more detail about what's the p-value and what what else is involved in doing a hypothesis test okay so the p-value is a concept that's very important but it's often difficult to understand and it is sometimes misunderstood and here's the definition of what the p-value is and i will again keep explaining and illustrating many times how this area in orange is the p-value the p-value is uh contingent to your null hypothesis so given the random chance or the probability model that represents or embodies or or your null hypothesis and remember the null hypothesis that there's no difference between the two options the p-value is the probability of obtaining obtaining results as extreme as the ones that we observed in our experiment okay that's why the p-value is this area in orange though the value that you observed is this vertical line the 30 percent all the ones above that are more extreme are larger than the 30 percent so that area represents all the values that are more extreme than one observed and that's why it represents the p-value in the one where hypothesis tests the two-way values that are smaller than minus 30 percent in mag in absolute value uh minus 50 is in absolute value is 50 which is larger than 30 that we observed uh that's the meaning of the p value and we will keep repeating it over and over and we're actually going to compute it with code from scratch so we'll understand better how it's computed there's another important concept necessary for hypothesis testing and that's the the concept of significant significance level which is often abbreviated by alpha the greek letter alpha the significance level is the the probability threshold that we accept for unusual or extreme values often is used as 5 or 0.05 and this must be defined before the experiment is defined as initial stage where you're actually defining how you're going to do the a b test and the the significance level and the value alpha is the probability of we that we accept for committing a type one error a type one error is a false positive as in the type one area is a false positive and it is when we mistakenly we make the wrong conclusion that an effect is real when in reality it's just due to chance okay so if we use a significance level of 0.05 which is the same as 5 percent we are accepting 5 chance of making a false positive or a type one error which is to make the wrong conclusion that there is a real effect between the two options a and b when in reality there is there is no real effect between options a and b they were just due to chance so how how do we make the conclusion in hypothesis testing how do we make the decision of whether go with the null hypothesis or the alternative hypothesis it's the by comparing these two uh values the significance level alpha with the p-value if our p-value is larger or equal than alpha we retain the null hypothesis and conclude that the observed difference is just due to random chance there's no real difference between a and b however on the other hand if the b value is small smaller than alpha then we reject the null hypothesis and we conclude in favor of the alternative hypothesis that indeed there is a real difference between options a and b and the difference that we observed in our experiment is due to a real difference between options a and b okay so what does it mean the p-val the p-value being larger than alpha again the p-value is the probability of obtaining uh values that are as extreme or more extreme than when you observe if there's a high probability of us observing um very very extreme values then if there's a high probability of the the difference we observe b just do to be just due to random chance that's why we conclude that uh with a large p value uh the the difference that you observe just due to random chance is not real really due to a real difference between a and b okay and we need the significance level alpha to make the decision right we need to uh the alpha is the one that tells us okay what's a large p value and what's a small p value okay if you use five percent for the alpha that is the threshold for the p-value if the p-value is larger or equal than five percent then we retain the null hypothesis if the p value is smaller than the five percent then we reject the hypothesis in favor of the alternate vibrances okay so those are the main concepts in hypothesis test uh the p-value the significance level and how they are used to make the decision and the their meaning the alpha the significance level is the probability that we accept of making a false positive a type one error okay to make the wrong conclusion that there's a real effect that there's a real difference when in reality there is not and it's just to the chance the p-value is uh the probability of obtaining results as extreme as the observed ones uh just due to random chance so the decision is if there's a high probability of observing a result like that just due to random chance then we conclude that the result you observe is just due to random chance if there's a very low probability of observing such a result do just a random chance then you conclude okay this is not restaurant and chance this is real the real difference between a and b so that's the base la the base of how hypothesis testing is done and how the decision is made now we need to compute the p-value right the alpha the significance level we set at a certain uh threshold it's a common it's common to use five percent uh but we need to compute the p value okay and now i'm going to show you how to compute the p value and notice we talked about the type one error because that's the the type one error or the false positive is the probability of alpha or significance level there's also a type two error which is the complementary uh false negative when we wrongly conclude that there is an effect or that that the difference we observed was due to chance when it was actually real so we can have these two type of errors the type one error which is false positive and the type two error which is the false negative the false positive is related to the significance level indeed it's the probability of a false positive is the significance level itself alpha right okay so we need to compute the p-value and how can we compute the p-value well one way is to use some classic uh statistical tests but we're not going to do that we're going to use this method this permutation method because it's a way to understand and see the how the pivot is actually computed so permutation test is a resembled procedure that is used for hypothesis testing what's resampling is to repeatedly sample the values that we have in our observed data to assess or determine the statistics the statistics of a random the random variability of our statistic in our case uh the statistic metric that we're using to compare the option a and b there are two main types of resampling procedures a bootstrap uh is resampling with replacement when you have your data and you do with something you when you take one image that you have all your data as each data point is a wall and you put all all the balls in a in a in a bag and resampling is basically going inside the bag and taking one of the balls and looking at it and see what it is and you can do resembling with or without replacement with replacement is when you go to your bag and take one ball which is one data point you look at it you take note and then you put it back in so it's available for the next time you go to get a new one resampling without replacement is when you have all your data in a bag you take one ball which is one data point and then you take note of what you you you took out of the bag and you put it away that doesn't go back in the bag doesn't become again available for the next uh draw okay so bootstrap is a resampling with replacement where you take a ball look at it and put it back in so it's available for the next time and permutation is with something without replacement you take a ball you look at it and you put it away so now you have less balls to take from permutation is the one that is used for hypothesis testing and bootstrapping is used for other purposes such as assessing the reliability of a statistical estimate like in your statistical models we're going to be talking about permutation and permutation tests so the permutation test is a resemble procedure that we use to hypothesis testing and we're going to explain it step by step so we'll see in action okay uh so the is the problem in more detail is kind of the process of combining um two or more data samples in our case we're going to use our two data samples together and randomly relocate those samples to newer samples so that we can see how the test statistic varies as we do this resampling procedure you're going to see that in action it's better to unders i'm going to explain and show you examples um it's going to be better it's going to be easier to see in the next slide okay so for us the permutation test is going to be a way to create the new model the the p value is computed from the new model and the new model is kind of uh what variability would they expect in our test statistics just due to random chance and that's what the permutation test gives us okay okay so the advantage of the permutation test especially over the classic statistical tests is that it makes no assumptions the permutation test makes no assumption about the test there's no assumptions of normality or any other assumptions the no model is created directly from the data itself and you'll see how it's done next okay so that's one big advantage of the permutation test all right i have here an explanation of how the permutation test is done kind of broken down by individual steps okay remember the permutation test is now what we are using to do what was step four of an a b test which is the hypothesis testing part to compute the p value okay but the permutation test test is itself as a whole procedure uh for uh doing it and we start with kind of step zero having db test results and here we have in our steps your daily test results so we add two groups and in group a we got these results where we have 70 percent of the people converted so we have seven greens and in group b we have the 40 conversions we have four greens that's the starting point for a permutation test okay the first step in our permutation test is to put all these results that are currently separated into groups dnb together in a single data set in a bag and i'm going to do that by putting them all here in step one i'm going to put all of them together in one bag and i'm even going to order them the greens and the and the red so it's easy to see how many we have in total so in total when we combine a and b we have 11 greens and nine reds because we had seven greens from a and four greens from b seven plus four is eleven okay so now what we did in step one was we had these a b test results the results for a and results for b and we put them all together in a single bag okay there's no separation anymore they are all together in a single bag and now the step two the steps two to five is to perform one permutation and now you'll understand what the permutation is a permutation is take the balls from the bag uh randomly randomly take a set of balls from the bag and assign it to group a and then take the another random set of balls from the bag and assign to group b okay so if you will in kind of more uh broken down is first we'll shuffle the bag we'll randomize this bag and then we draw a random sample and without replacement for group a and after that we do another random sample uh without replacement for group b and finally we measure the test statistic the difference between a and b and that's what i'm going to do now from this step one where we have all the balls in the bag two uh the one permutation that we have here the steps between five to five all together so i'm randomly assigning all the balls that were in the bag to uh to be a resample for group a and the resample two for group b and now we measure again our statistics so for a now we got five greens so that's a 50 conversion rate and 4b we got six greens that's a 60 conversion rate and now the difference between a and b is 50 minus 60 that is minus 10 percent so this is one permutation only okay in the permutations test in the implementation test what we're going to do is this process many many times we're going to do many permutations and at each permutation we're going to get a value for our statistic in this case got -10 and as we do more permutations more uh each of them randomly will get more values and every time it's a random value and with all those values we'll be able to create the density plot of the new model the null hypothesis okay so that's one permutation and then step six is to do many permutations basically repeat those steps two uh to five many many times at each time we we take we record the test statistic and then we'll use all those test statistics to create uh the the probability model that represents the null hypothesis and in that for the hypothesis test we'll use all those results to compute the p value we'll have the null model the probability model for the null hypothesis and then we'll be able to compute the p value as the ratio of all the values that we got that are as extreme or more extreme than ones you observed and again we're going to show you and compute that from scratch so you can see how it works right uh the other perspective of how the permutation test works like from the perspective of one individual involved in uh in the process uh so you can think of this person that was in our results a green result in group a we put it in the bag with all the others and then we just happened to be randomly assigned back to a but say another person this person that was in our results in group a went into the bag and then when we did the resampling ended up in group b okay and so on and so forth right people are randomly assigned or data is randomly assigned to the different groups and b and points that were from a could end up in a again or then end up in or end up in b and so on and vice versa okay all right so suppose we did that process okay of uh permutation the permutation process this process 100 times so we did it one time we got minus 10 percent now we do it uh put all the the balls in the bag do another resampling and then you'd get another different you could get again minus 10 percent or a different value and so on and so forth and we're going to do that 100 times each time we get the value for our statistic okay and this is the results that we got so i did 100 permutations i'm going to do this in the in code uh we'll you'll see it in action and i got these values i got minus 50 minus 30 minus 10 10 30 and 50. and i got minus 50 four times minus 30 12 times minus 10 30 times 10 33 times 30 16 times and 50 five times so this is the table of our permutation results and is represented here in this bar chart okay in this bar chart what you have in the horizontal axis is the test statistic the minus 50 minus 30 minus 10 10 30 and 50 and the height of the bar are the counts of how many times did we get that in our permutation uh you know implementation results and as under the null hypothesis there's no difference so you get more results around zero okay so you get more results for minus ten for ten okay but we got quite a few results that are as extreme or more extreme than our results so this is where our observed result when we did our experiment we observed the difference between a and b of 30 and now these are the results that are as extreme or more extreme than the 30 percent okay we got 16 30 again that is as extreme of 30 itself and we even got five fifty percent differences where a is fifty percent better than b okay so these are the values that these are the counts that we'll be using to compute the p value okay these ones we use for like the one-sided or the one-way test and but we also need to consider these other ones on the lower end for the two-way hypothesis test because -30 is as extreme as 30. in this case -30 is saying that b is better than b by 30 percent b is better than a by 30 percent okay or 30 here is saying that a is better than b by 30 minus 30 would mean that b itself is better than a by 30 percent and minus 50 is also as more extreme than the 30 percent so for the two-way test if you recall the null hypothesis that there is no difference between a and b a and b are the same the alternative hypothesis that a is different than b there's a difference between a and b so for the two how can we compute the b value for the two a hypothesis test we need to look at all the results we got from the permutation all the counts and we need to find the results that were as extreme or more extreme than one we observed so we observed 30 percent which of these values are as extreme or more extreme than 30 percent well 30 itself and 50 so these 16 and 5 count so we need to add the 16 and the five but because we are doing a two hypothesis test we also want the other side of the negative values that can be as extreme or more extreme than thirty percent and we have the minus fifteen minus thirty so we also need to count the four these four minus minus fifties and these twelve uh minus thirties so the total count of extreme values values that are as extreme or more extreme than when you observe the 30 percent are the four times we got minus 50 the 12 times we got minus 30 the 16 times you got 30 and the five times we got 50 and that's what we are summing here 4 plus 12 plus 16 plus 5. those are all uh the counts of values we got in our permutation test that are as extreme or more extreme than what we observed in our a b test experiment okay and when you sum all of that you get 37 okay so in 100 permutations where we kind of simulated the same process 100 times we got 37 of those 100 values that are as extreme or more extreme than when we actually observed in our experiment now we can use that to compute the p-value as simply the ratio of those counts of extreme values by the total number of the total number of times we did the process so our p-value is going to be 37 over 100 37 are the counts of extreme values 100 is the total number of times we did this process the number of values we got in our plantation test so that's the p-value in a two-way hypothesis test is the ratio of uh or the probability of values that we would obtain uh in a process like this just due to random chance the one way i put the one-way test our new ipod is that a is not better than b so if a is either worse or the same as b and the alternative hypothesis that a is actually better than b so for the one-way test the extreme values are only the extreme positives in this case because we have a being better than b by 30 percent so uh these are all the only extreme values are these where a can be as better than b as it is when we observed so for the one-way test we count the 16 plus the five and those are the number of extreme values uh the 21 add up to 21 and the p-value is going to be that those those counts of extreme values divided by the total counts of times we did this process the total number of values we obtained through the permutation test so it's 21 over 100 and that's 0.21 so we computed the p-value through a permutation test and we got the p-value for the two airpods test as 0.37 and the p-value for the one-way airpods test at 0.21 okay so now what's the decision well the decision is our p-value for any of these is very large is larger than our alpha of 0.05 so the conclusion is that we have to retain the null hypothesis that the observed difference is due just to random chance and it's not due to real difference between a and b so that reaches the conclusion of the hypothesis testing part uh through a permutation test so we did a permutation test we got these counts of the results for the permutation test now we can look at them and see how many of them were as extreme or more extreme than what we observed which was 30 in the very beginning and we can use those counts to compute the p-value basically the p-value is the ratio of uh or the probability of fraction of values that are as extreme or more extreme than that 30 percent we observed and they are highlighted here the p-values are lighted here in orange because this minus 15-30 and the 30 and the 50 are the values that are as extreme or more extreme than 30 percent and the its discounts of these the heights of these bars that contribute to the p-value these ones in the middle are less extreme than the 30 percent to observe so they do not count for the p-value this is the bar chart version of it and this is the equivalent density plot and again here we have the vertical red line at 30 percent and the dashed red line at minus 30 percent so all the orange area are corresponds to the values that are larger than the 30 percent or smaller than minus 30 percent and those areas are represent the p-value all right okay so let's do this with code now uh so that's what's got an explanation of the concepts and we use some data visualizations to try to understand better those concepts we're going to actually code these uh kind of from scratch using both r and python separately and i'll be using r on the left and python on the right so r is going to be the code is going to be with the gray background and python is going to be with a yellow background okay all right so i'll be using uh some packages just for convenience most of it is just going to be done in r is going to be done in base r and in python is going to be done mostly in base python as well okay although we do need some uh like sample function from the random package and i'll be using some functionality from pandas as well just to make our life a little easier all right so let's try to reproduce this by code so we can see how it works first i'm going to define uh some variables uh that define our problem and if you recall we had a total of 20 subjects so t is a variable i'm assigning to it the value of the value 20 and t is a variable that represents the total number of subjects the a and b the total of a and b and we assigned uh 20 of those 10 of those to a and 10 of those to be so the size of group a is equal to 10 and the size of group b is also equal to 10. okay so a variable a is the number of subjects in a and b is the number of subjects in b now we had uh in our experiment the observed experiment we had seven people saying yes in group a and four people saying yes in group b so a underscore yes is the variable that contains the number of people that said yes in group a and b underscore yes is the number of people that said yes in group b so these variables fully define our experiment and now we can compute a few other results from our a b test experiment in particular we can compute the total number of yes's which is the sum of these two variables the a s and the bs so the total number of yeses is a s plus b s and the total number of nodes is going to be t which is the total number of people minus the ts which is the total number of yeses and what we are interested is in the conversion rate so this aes percentage is the ratio of the count of yeses by the total number of people in a multiplied by 100 to be a percentage and the same for b b s b c b c short for percentage is the percentage of yes in b so our test statistic the metric we are using to compare the groups is the difference between a and b the aes percent minus the b as percent and i'm assigning that result to a new value called a b underscore yes percent and i'm printing here all the results so this cut is just a convenient way of printing the results in a nice format and to observe the s rate is 70 for a 40 for b and a minus b is 30 which is 70 minus 40. and just to recap the total counts for yes and knows we had a total of 11 yeses and a total of nine nodes so that's just the code in r base r to define the problem and compute some additional results that are going to be useful moving forward and here's the equivalent code in python actually it's very similar so in python the assignment operator is always equals in r you can use equals or the left arrow assignment in r i'll be using the equal assignment uh so that it uh it shows that the similarity between r and python but all this code is actually all this code is the same just assigning values to variables and this one is also the same just using those variables to compute new results and assign those new results to other variables and this one's also the same and here's just slightly different in python i'm using the print function to print the message that i want in the end and i'm just printing it so that it prints similarly to what is on the left in r r also has a print function but is less convenient cut is a function that does concatenation and printing all together so first concatenates the expressions and then it prints them to the output so that's why it's more convenient to use cut here and again we have the same the same uh experiment uh seventy percent uh yes for a forty percent yes for b uh a difference of a minus b of thirty percent and a total of eleven yes's and total of uh nine nodes so this is setting the stage and defining the problem and you can see here how we use variables to assign values to variables and then we use those variables to make computations and we assign the results of those computations to new all variables uh sorry i just saw the comments that's these the slides still blurry okay it's fine now good okay all right so next uh i'm going to show you how to make one permutation um just one and then we'll use these to make many permutations okay and i'm going to recap the steps for doing a permutation uh i'm just here i'm just setting the seed the random seed so that these results are reproducible every time you repeat them they are still random but they are the same random results so if you recall the first step to do a permutation is to put all the results in a bag and that's what i'm doing here i'm putting all the results in a vector and one represents the asses and zero represents no so before in our visualizations i was representing the yes with the green ball and the no with the red ball now all representing a s with the number one and the no with the number zero and this c is the combined function that is used to create vectors in r and rep is the repetition function that is used to make repetitions of of in this case the number one so we are repeating the number one uh a certain number of times how many times the total number of yeses that we had in our data and then we are repeating the number zero uh the total number of times that we had nones in our data so this creates a vector of ones and zeros with the num with the amount of ones being the total number of yeses inaudited and the amount of zeros being the total number of nodes in our data and we i'm assigning this vector to a variable called back one and actually it's the one i'm printing here so you can see that bag one is a vector of 11 ones and nine zeros so this creates a factor of eleven ones followed by nine zeros and that's the first step of putting all the results of a bit over every test in a bag in this case our bag is a vector second step is shuffle the bag okay so for the randomization process i'm using the sample function from base r to sample bag one and i'm just assigning that to a new variable back too so that in the end i can print them both separately and here is the back the shuffle bag now it's this one so we we started by putting all the results uh in our bag one and i put them by order so we can see that indeed we have the expected number of ones expect number of zeros and in step two we shuffled the bag and now this is the bag with the same amount of ones and zeros but now shuffled in a random order that's step two step three is to take a random sample of size of group a and uh because the vagus already already randomized i'm just going to take the first a elements of the bag of the vector to b to represent the random sample for a and then i'll take the remainder elements of the vector of the shuffle factor of the shuffled bag to be the random sample for b and here i'm using the square bracket notation to slice a vector to select the elements from one up to a remember a is the number of people in group a and r starts indexing or counting at one so that's why the first index as one is one python starts indexing at zero we don't use it explicitly there but that's an important difference between rn1 and python okay and then we just this is just a way of selecting the remainder of the the results we have in the bag after we take the random sample for a to assign it to the vector for b the random sample for b indeed you can see them here and you can see that the random sample for a is the first part of our shuffle bag okay you see it's exactly the same and the random sample for b this one is the last part of the shuffled bag the exact same so we are able to get a resample for a and every sample for b and now we can compute uh our statistics so we go to the read sample for a and we're going to count the number of yeses and how many s did we got did we get we get four plus one we got five yeses that's a fifty percent rate five in ten and then we go check b and in b we got one two three four five six uh that's a sixty percent uh yes rate so the difference between a and b is fifty percent minus sixty percent which is minus 10 okay the same thing that we got when we did that visualization explanation and as a reminder the observed difference the one that we actually observed in a real a b test experiment was 30 but now this permutation gave us -10 so when we did our experiment we observed that a was better than b by 30 percent when it uses this permutation and we see that by random chance b could be better than a by 10 percent but this is just one permutation we need to do many permutations that's what's coming next before i show you that let me just show you the equivalent code in python here we set the seed uh again the random seed in python uh so that the results are also reproducible while still being random here is a way of put creating a bag in this case a list a list of ones and zeros the ones as many as there are total yeses in our data and the zeros as many as there are total nodes in our data so this creates the bag the initial bag which is this list again a list of ones and zeros eleven ones which was the total number of ones or yeses in our data and nine zeros which was total number of nodes or in our inner data now we're going to use the sample function to similar to the sample function in r to shuffle the bag and assign those results back to and here's the list representing our shuffleback as you can see it's randomized now and as we did there we're going to take the first part of back 2 and here we can just use this short notation column a to get all the elements starting at index 0 up to in index a which is the number of people in group a and we're going to assign that to a uh underscore rs the random sample for a and then we're going to get the remainder of back of the back to after a after index a to be the random sample for b and you'll see here's the random sample for a and you'll see that this list has the exact same elements that are the first part of the shuffled bag and then for b we get these other lists which are the exact same elements as the second part of the shuffle page and then we get we can compute the the statistics again go check the resample how many s did we get we got 50 again five five yeses that's 50 percent and for b we got six yeses that's 60 so the difference between a and b is minus 10 50 minus 60 is minus 10 percent as in the other example so that's how we could do just one uh permutation uh both with r and python now let's see we need to do this many many times how can we do that with code and here's how we can do that again we start by setting the seed to make results reproducible and by putting all the results in our bag okay and the bag is a bag of ones and zeros now we need to do the step six of the permutation test which is to repeat the previous steps two to five a large number of times to do many permutations how many well i'm going to set this parameter p as being the number of mutations that we'll be doing and we'll be doing 100 permutations okay i'm going to create this initial vector which is going to be the vector that's going to contain our results of the permutation of each permutation so it's going to be a vector of it's going to start as a vector of zeros and when we do its permutation we're going to update each position of that vector with the corresponding value for the test statistic for that communication now when we want to repeat the same process many many times we use a loop construct in this case we're going to use a for loop and the syntax for a for loop in r is the keyword for followed within parenthesis the variable that we'll be using to iterate over and in uh kind of gives us a range of values to iterate uh by so we're going to iterate over 1 up to uh value of p which is a hundred so we're going to create do a loop operation 100 times at each time i is going to take the values of 1 up to 100 so i is going to start with the number one and that's going to be number two and number three and so on and so forth and at each iteration uh so the curly braces here enclose all the code that's going to be executed at each iteration of the for loop at each iteration we're going to shuffle the bag which is the step two of uh permutation test and then take a random sample for a as we did before that's the step three uh take the remainder as the random sample for b that's step four and compute the test statistic which is again the difference in the yes percentage for a and the s percentage for b so again to do that 100 times at each time we're going to save the result and the result is going to be saved in this firm and the scores which is short for permutation result and in the end i'm printing here the results and you can see that in the end we have 100 numbers these are 100 numbers uh and they are they can they are random numbers that 10 minus 10 10 minus 30 and those are the numbers that we get through the permutation test process okay and these are the numbers that next we'll be using to compute the p-value we can do the same in python again set the seed create a bag with all the data from our experiment set the number of permutations that we're going to we're going to do the hundred again uh create an initial list of a hundred zeros okay where we're going to say going to update it to save the results and we're going to use another also for loop the syntax is slightly different you can see here that the syntax for python for a for loop is the keyword for and then the variable for your loop in and here i'm using the range function to define the range to iterate over so this defines a range of 100 elements in this case they are actually the numbers 0 to 99 but that's just the detail because python starts indexing at zero and again at each step we shuffle the bag with the sample function and we take the first part of the bag to be the random sample for a we take the remainder part of the bag to be the random sample for b and we compute our test statistic which is the percent yes rate for a minus the percent yes rate for b and we assign that to our corresponding element in the list okay and in the end we print here the results now printing here is a little more tricky and to make the the printing work nice in these uh slides so you don't need you don't need to worry about these details it's just a way of printing the results so that we can see all of them but you can see that there are also 100 numbers and each of them comes from a permutation and all together they will let us compute the p value they represent the null hypothesis or the probability model of the null hypothesis and we can use them to compute the p-value so we did or many permutations in this case 100 permutations we got the 100 values that we can use to uh that represent the model of the null hypothesis and now we can use these values to compute the p-value i'm going to show you how to use code to compute the p-value now from these um results okay so in r we can use the table function to make the count to get the counts of our results and we can see that as we add in our data visualization explanation and we add four times the number of minus 50 12 times the numbers minus 30 30 times the number minus 10 33 times the number 10 16 times 30 and five times you got 50 as our tested and now we're going to use these counts to compute the p value again the p-value is going to be the ratio of uh values that are extreme or more extreme than one we observed we observed 30 so these 16 and five count our council are extreme counts and these four and 12 are also extreme counts for the 2a hypothesis test okay so for the two hypothesis test what we want are all the values that are in absolute terms larger or equal than or observed values so the perm results is a vector sorry of a vector with all the results uh from implementation test the abs underscore pc is the observed difference between a and b in our a b test so what we want is all the values in this vector that in absolute terms are larger or equal than our observed uh difference in our ib test experiment which was 30 and then we're going to sum all of those okay so this returns a vector of zeros and ones as to whether each of those values is larger than the observed value and when we sum all those zeros and ones you get to count of all the extreme values in our vector right so and in the end this is the extreme count and you can see that it results in 37 which is this four plus 12 plus 16 plus 5. and the p value is just that 37 divided by the total number of permutations which is 100 so 37 over 100 is 0.37 that's the p-value for the two-way hypothesis test for the one-way hypothesis test we want uh the values that are larger or equal than the observed value itself now not in absolute terms so the negative ones don't count anymore we'll just be catching these 16 and five 16 plus five will be 21 and then the p-value for the one-way hypothesis test is 21 divided by 100 which is 0.21 so this cut again is just a nice way just some code to make the nice printing that you see here okay and this is how you can compute the p-value from the permutation test results just finding which of those permutation test results were as extreme or more extreme than the observed uh result in rb test experiment and you can do the same in python now to do this in python it's convenient to transform our results from a list to append this series that's what you're doing here and this is just a little bit of elaborate code to create the the table the equivalent table as in r and print it in a nice way the same way as it was within our don't have to worry too much about that but again in this case uh we are using a separate kernel we are using a python kernel so the randomizations happen slightly different so in in python we got to minus 50 two times and we got 30 18 times and minus 10 31 times and 10 32 times and 30 15 times and 52 times then we can compute the the get the count of extreme values the same way here to get the to get the absolute values don't use the absolute function like that in r we use the absolute methods in pandas for for the series object which is now this term rise and the score s is a series in pandas and we again want to count how many of them are larger or equal than the absolute value of what we observe which was thirty percent for the two hypothesis test and for the one whereabouts test we want ones that are larger or equal than what we observed and this one we don't use the absolute value and in this case it also happens that we also got 37 extreme counts for the two hypothesis test and the p-value is also 0.37 but for the one-way which is just this side we got 15 plus 2 which is 17 well here we had gotten 16 plus 5 which was 21 but still in both situations we get very large p values so the decision is we have to retain the null hypothesis because the p values are very large are larger than 0.05 which would be our alpha standard threshold right and so to recap again in using the data visualizations we did the permutation test we got discounts of values in our computation tests and we use them to compute the p-value we as the counts of those values that are as extreme or more extreme than what we observed in our experiment and our this uh density plot represents created from those counts okay from that permutation test and the vertical line represents our observed value which was minus 30 the origin r is orange area are the values that are larger more extreme than what you observed so this orange area represents the p-value indeed the full area under this curve this density plot is one and this portion in orange uh uh tells of probability and it's the p value itself and it's 37 because it's those 16 plus 5 plus the 4 plus 12 and divided by 100 gives you 0.37 the p-value and you can get this graphical visualization of the p-value as a density plot or this one as the bar charts as we did before or this one works as a bar chart because we have very small numbers very small as small accounts just for the nautical purposes uh normally what you do would be an instagram like like that and not a bar chart but in the end what you really want is the density plot and if if you just want to create one visualization of it the density plot is the one to go with all right so that brings us to the end we did all this all those steps with code step by step and now just checking how we're doing on time i could give you uh some um something extra right we are a little over time but i'll just show you briefly we use the codes kind of step by step and different different different slides at a different part of the code but once we have this code we can put it all together in a function and then we can use that function to uh do all the work for us in one line so i created um functions to do all this work for us and in particular i created this one function a b permutation test which i can show you afterwards if you'd like to that we'll do all that that we did step by step now so so that i can show you different results for different values of ev test experiments so this a b permutation test is a function that takes in parameters that i think that define our a b test experiment and uh the number of permutations that we want to do and it will compute the full permutation test return the from the p-value of the permutation test and show us the graphical uh visualization of the null hypothesis and the p-value like so so repeating for the same example i've been doing all along we have 10 people in the a-group so the aol parameter for this function is the number of people in the group the bl is the number of people in b 10 again we have seven yeses in a and four yes in b and we're going to do np the number of permutations is going to be 100 okay and so this is what you observed this was our experiment we have a 70 yes rating a 40 yes rating b and the difference between a and b is 30 percent now we did 100 permutations and this is the result we get we get the p value of 0 36 it's also represented here as the p-value and graphically represented there now i'm going to show another example where the p-value is much smaller okay and also an example that uses much larger numbers so in this experiment which is a different experiment we have 1 000 people in i and 2 000 people in b you don't have to have the same people in a and b normally you want to have the same people in a and b but they don't necessarily have to and in this experiment we got 30 asses in a and we got 100 yeses in b okay now we want to know if this is significant or not so what we have in terms of conversion rates or yes rates for for a we have 30 over 1000 that is 30 in 1000 is 3 so only 3 percent of the group a was a yes and in b is 5 which is 100 over 2000 that's five percent so the difference between a and b is minus two it's three percent minus five percent that's minus two percent in this case in this experiment the observed difference was that b was two percent better now the question is is this significant or not we can use this whole process now is all being done just in one function and we see this graphical representation and the b value here now the p-value is very small it's smaller than 0.05 so we would conclude that this was not the random chance we'd conclude in favor of the alternative hypothesis that there is a significant difference between a and b so even though this difference maybe looks small right just a minus two percent difference it's actually significant because the p value is very small and you can see that the small p value is now a very small orange area here in the plot okay so in the previous experiment the one that we did all along the orange area was very large the p-value is very large that represents the proportion of values that are as extreme or more extreme than ones you observed it's very large when it's there's very few values that are as extreme or more extreme than what we observed that area the orange area which is the p-value is going to be very small okay indeed the p-value is really small is 0.018 that's 1.8 percent okay all right uh we could even uh play with this in a um shiny app which i can show you later if you want in a q a but that's the the end of it okay so i'll open to q a and i hope this was helpful and to understand these uh difficult concepts and to show you that you know you can code even difficult concepts with very simple code we just use simple variables simple operations uh multiplication and division and addition and subtraction and and we use for loops and you hopefully you understood how for loops work uh so hopefully this was uh helpful for you to understand uh these useful concepts that sometimes are difficult to understand and finally i'll leave you here and i can share this on on on the chat as well that if you are interested in learning more with us you can check us on our website you can check our our different programs we have bootcamps on data science and data analytics and we have also professional development courses uh you can take individual courses or take course bundles and you can always reach out to our admissions you can apply to all of these for free and you get input you get contacted by your team to understand what's the best uh problem for you okay so i'll open now to q a and uh i'll check what uh okay i'll share the presentation you'll get the presentation uh and all the materials uh for the presentation by email yeah okay okay so how long to run an a b test uh that's a good question and uh that's related with something we didn't cover uh and i'll i'll give you a few pointers and you'll get this as well so uh something we didn't cover in the beginning was okay how do you define uh all the details of the a b test and for that you need to do what's called power analysis this is more advanced that's why i didn't talk about it so when you are defining an a b test we need to define okay how many how much data do we actually need um so it's related to how long you need to run the test depends on how much data you need and to do that you do what's called power analysis where you balance all these false four factors uh the sample size which is the number of samples that you'll need to to use if you have all the data you can use it and do it all already but sometimes you actually have to wait for the data to come in and that's going to determine how long you need to wait for that how many data points you need and to make the determination you need to have so if you are if you want to determine the sample size which is a common task to determine how many points do i need which is going to tell you how how long do you need to wait uh to get all those points uh you need to know these other three you need to define these other three important uh factors okay you need to define what's the effect size you want to see the effect size is the minimum size of the effect that you want to detect with your test for example you want to be able to detect at least a ten percent increase in your yes rate so your conversion rate to your click rate or your per uh your whatever metric you are looking at and in addition to that you need to define uh the power that you want and the power is uh the probability of detect an effect when the the effect is actually real and that's basically the probability of a true positive and also we need to define the significance level that we already talked about which is the probability of false positive okay so to determine to answer the question how how many points do i need how much time do i need because the more points you need the more time you need to wait you need to define beforehand okay what's the effect size the minimum effect size that i want to see uh what's the power uh that that i want the probability of a true positive and what's the significance level that i'll be using once you know these three once you define these three you'll uh it will give you the the ideal sample size that you need okay and to explain in detail we'd have to go through that process that we just did just to explain the p-value and it would take like a whole a whole another one hour or two hours but you can do all of that of what we did and all of these power analysis with these packages uh the pwr package in r and the stats models packaged in r their functions to do all of the things we did and uh in also to do these power analysis where you you can define the effect size the powers and significance level and there's a fund and there are functions there that will give you the sample size that you need okay and with that you'll know how much time you need to wait important note uh the power is the probability of a true positive and you'll often see it defined as one minus beta where beta is actually the probability of the type two error that we talked about but i didn't mention that was the bad okay so we talked about the type one error that's the false positive that has a probability of alpha that's our significance level the the probability of a false negative the type two error is often called beta and the power is the inverse of of the type two error is the the true positive and the probability of the true positive is one minus math all right i hope that answers the question and there you have references there that you can go to where you have functions that do this for you so normally we'll not be doing this by hand ourselves you just go to these packages that have functions ready uh okay where there's still a black rectangle you you'll get the slides afterwards uh the black rectangle may be the zoom interface that i'm trying to move away from but i am not able to uh move it away okay um let's see one way or two oh uh whether you do one one-way hypothesis test or two hypothesis test that depends on what you want uh are you trying to understand if two things are different or if one thing is better than the other normally uh people want to know if one is better than the other so if you want to know if option a better is better than option b then you do one my hypothesis test uh well some statisticians say that just to be safe even when you just want one way you do the two ways just to be safe because the two way is more restrictive right you'll you'll have more uh you have the other side of the tail to account for the p-value but um if you just want one way uh if you just want to test if one option is better than the other you could use just one way some statisticians still use the two ways just to be safe all right let's check if you have any other questions all right i think that answers all the questions let me give you these links okay all right i can give you those links all right so the code for uh that you at the uh you guys can use for answer those questions that you're asking uh those are the links and uh the links for our programs are these ones and the email for our admissions team so you'll have that okay right um any other questions anyone i can show you briefly these up shows um it's a more interactive way of what you just looked at here you have you can see that as you change the parameters uh of your a b test okay you you'll impact the the people you get so here what you have is uh 100 people in ia where 30 said yes and 100 people and we were 20 said yes and you'll see if i start so there's a difference of 10 percent there if i start reduce making the difference larger you'll see that the p value goes down and that at this difference where we had 17 the p value is already significant is smaller than 0.25 and the closer the results are to each other the larger the p-value meaning uh see now this is very large speed value and you can get all the way and here i'm just doing a hundred permutations but you can do more principle and do a thousand permutations uh you need to do a large enough number of permutations 100 is probably too small you want to do like more like a thousand okay yeah so you see if i if i increase the difference between a and b the b value becomes smaller and smaller little small to a point where it will become it will run down to zero there okay yeah it uh now it runs runs to zero so the more different uh the results are the more likely they are due to a real difference between the options a and b and the more close the results are to each other like you know here you have uh you have the same number of people in in a and b and 30 said yes and 25 said yes and b and that's very close the the probability of it being due to random chance is very large as it's represented here by the orange which is the p-value carlos did you just share the link to the notebook or did you share the link to the packages i i shared the link to the packages we'll send the presentation afterwards by email you'll get a zip folder with everything all right thank you everybody for sending me your email address and um we'll send the we'll send the recording as well as the uh like carlos mentioned the zip file in the next few days all right okay i simply answered all the questions uh if anyone has any last-minute questions right okay again thank you everyone it was a pleasure um thank you everyone um

Info

Channel: NYC Data Science Academy

Views: 732

Rating: 4.4545455 out of 5

Keywords:

Id: ZdC8dwL0rlI

Channel Id: undefined

Length: 103min 50sec (6230 seconds)

Published: Thu Jul 01 2021