Theoretical Statistics is the Theory of Applied Statistics: How to Think About What We Do

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
I love speaking here so much that I just love this community and especially coming from academia I love how this world is so non hierarchical actually work on hierarchical models but I feel kind of bad about so we call multi-level models no no levels are higher than other levels and yeah like it's not about credentials or even like about money like in this world like it's just like people doing the work and I just love it my favorite place to speak ever except will except for the stand conference of course now when I spoke at the Strand conference I prepared my my talk was called 10 things i hate about Stan and that might stand you know ko stand people were like you can't do that I suspect it's like no no it'll be fine so it was okay so then I was going to speak here to Jared said what talked I said well you know let's do ten things i hate about our or actually like you know you guys it's big conference twenty things I hate about our and and Jared was like you can't do that so we said what do you mean - well you don't understand the stand it's like a close-knit community there are people like well to be honest they don't all have such a great sense of humor like if you if you so jared was like if you you know if you say something negative about our they'll they'll jump down your throat you know it's like a you know it's like a VI Emacs kind of situation with these people so i I had to throw away my slides that's why the blank so I want to sort of like cast a critical eye on statistics a little bit are it's great we we all know that how do we how do we know this how do we know it um you know I I think I've solved a lot of problems with our sometimes I'm stuck I can't use another tool I use our other people find it useful I've heard that it's increasingly searched for and Stack Overflow I feel like we got to get more Stan questions in there now so I can be on the next slide for that kept looking for stands but we're in the we have our own group of course and then but if you were to like take an introductory statistics book textbook or class you say well how do we know something works that's not supposed to be good enough like I use it so it's good right so statistic textbooks and intro stat classes are full of stories where people like they think the treatment is great but actually it's not right like they think this treatment hurts but actually sick people take the pill and healthy people don't and when you do a controlled experiment you find it really works or it doesn't work lots of examples of surveys like hahaha they did this survey but it wasn't a random sample and so they were wrong wrong wrong etc etc and but when statisticians talk about what they do like what tools do they use or you know I don't know why is Bayesian inference a good it could sing it did you do a controlled experiment like comparing Bayesian inference to something else no no we didn't do that or like people love my class how do you know well dig you to a survey with it well no I didn't actually do that I did the equivalent of the literary digest Paul I conducted underwater something here did like nothing oh how do I know that my class is effective well of course I did pretest and post-test and I did a control explode no I didn't do that control experiment as to different ways of teaching my class but that's okay because I didn't have a pretest so you know with that you need that pretest because even with the controlled experiment your people dropping out doing different things you to control for that so it's too bad you never pretest but that's actually okay cuz I don't have a post-test either side no measure of my outcome of it but that's okay because I don't actually know what my treatment was I'd never record that so I usually the textbooks it's like oh you better randomize randomized a randomizations like number 15 on the list of things you should do first you should figure out what your treatment is and then you might want to think about like what your measure of success is and your pretest measurement and so forth and so on there's so much so we're not doing any of that and I'm not sure what that means maybe it means that we should be doing that sort of thing maybe we should be evaluating the use of our experimentally maybe it means that these principles don't really apply maybe we're teaching a bunch of useless stuff maybe the principles apply but only in other problems not in anything that we directly care about I'm not sure I'm sometimes we can shift what problem we study so there's this whole story about decision analysis and the textbooks decision analysis is typically described in terms of personal decisions and we'll get these little vignettes right like a your nephew was trying to decide where to live and he could live off-campus which was less convenient but it's cheaper he could live on campus and you know they balance with things and there's a value function but nobody solves their decision problems this way you wouldn't really recommend it there's a sort of garbage in garbage out aspect to that like it's more like you figure out your decision and then you can go explain it does that mean decision analysis is useless no I like either term institutional decision analysis that if you're an institution like a company or division of a company a division of a university you have to make a decision about how to invest your resources you need a paper trail you should have a decision analysis you should you should have to defend it this comes up a lot I've been known to recommend that when people analyze noisy data that they use prior information and some people have said that they don't really have the prior information that's why they're doing the damage experiment in the first place and what I stay back to them is I would like you to state what your assumptions are and if you you can go from there and it's you can you should have to record what your assumptions are that drive your conclusion so for example there is a study might have mentioned it one of the previous my previous talk sir there's a study a couple years ago of our early childhood intervention where they did an experiment on some kids four year old kids in Jamaica and some of them got an intervention and some didn't they follow them up and the kids 20 years later the group who had the intervention was making forty percent more money than the group that wasn't and the difference was statistically significant so you asked him in a forty percent with a standard error of fifteen percent of something like that that's kind of so these estimate is it's a very reasonable estimate if your prior distribution on the effect sizes anywhere from minus infinity to infinity a nice uniform distribution or even something like uniform between 0 and an 80% improvement we give it to you I would like the researchers to have to write that down if they want to publish that their point estimate is forty percent improvement I want them to have to say in section three point two of their paper we assume the prior distribution that was uniform from zero to 80 percent improvement and then I could say well geez I didn't really think that you know helping out a four-year olds mother for a year is going to give them an eighty percent improvement in their income when they grow up I find that hard to believe I don't see the evidence therefore that they would have to defend it I think it's actually important to do so to sort of reel this back in decision analysis I like the idea of institutional decision analysis the concept is still valuable similarly I think a lot of the concepts of designed experimentation and random sampling are useful even though they don't always describe what we do and I I think that one thing well let me let's talk about the Hadley verse for a second okay so we we it's not only that we have to live in Hadley verse whether we want to or not okay the real issue with the Hadley verse is that we want to live in the Hadley verse this is where we want to be this is where you want to be hanging out now what's so special about this this universe this tidy verse where we would like to live um to me what's super special about it is that the Hadley verse is larger than our usual solar system our usual galaxy okay the Hadley verse includes things which traditionally were not even quite considered part of statistics they're just considered good practice now I want you to I think of statistic the history of Statistics to a large extent being folding things in that seemed like good practice formalizing them and turning them in routinize em turning them sort of into mathematics so just step back a little bit and let's talk about the weather um the popular topic of conversation it's kind of moved our thinking about the weather over the centuries has moved from you might say from religion to philosophy to science to technology so the Thunder you know the thunder used to be the gods throwing down thunder because they were angry I guess or maybe because they were just in a good mood then there was a philosophy like what does the weather mean where is the weather coming from it's from some imbalance and the humors of the earth whatever those scholastics used to say I'm then to a study of science and as I said now something that you can measure and it's some study technologically now similarly in statistics go back 150 years in statistics people were not doing random sampling of course you kind of know I'm sure that the proto data scientists of the era knew that you wanted a representative sample but it was like that's like good practice like Oh your sample is supposed to be representative or they weren't doing randomized experiments of course when people did experiments would try to balance you know what would it be like if all held elsa held equal but it's kind of like oh yeah you're kind of supposed to just like in the old days before they had leave earth like oh yeah tidy data you're supposed to like that's good practice so the role in the center role of academics like myself I see sometimes is trying to take practice and put it into our general theory that that's why I was talking about some theoretical statistics is the theory of applied statistics we can think of a really a lot of examples of that in hierarchical modeling so in the old days as let's say it's recently as forty years ago you would see discussions and statistics journals about multi-level models hierarchical Bayes empirical Bayes whatever they would call it and it was treated almost as a subject of philosophy so there's this kind of paradox it goes like that suppose you have data from eight different groups and you want to partially pull your inferences because your inference from any different group is sort of noisy so we know now you can fit a hierarchical model it's very straightforward people when people were developing this technology back in the 60s 50 60s and 70s there the objections and people would say that they'd say how do you know whether it's appropriate to partially pull information from different sources so there'd be these examples like someone might publish a paper where they have information from eight schools and they put them together and get better inferences and then somebody else would say well what if we have data from seven schools and you know something unrelated a measurement of the speed of light okay well you shouldn't pull those together but where did the math where in the math does it say not to do that right it's just like the map says if you have a Eagles eight you can do that so with this sort of weird thing like are the situation's exchangeable or not it was this awkward thing as our understanding grew we realized that this is actually sort of a mathematical question or a technological question so for one thing if you have eight schools and seven schools and the speed of light they're going to have a very large variance which means you're going to do very little partial pooling you'd actually be better with a mixture model so it's a distributional question you have eight things and they're not coming from a normal distribution they're coming from a distribution that's a mixture of seven and one and one or the other if you fit that you'll kind of get the right answer or you actually have information of subject matter information that the speed of light is different from a school then you can put that in two but the point is that it all sort of it all does fit into the mathematics another example would be regularization so we know about laughs so Bayesian methods all sorts of methods will fit your data automatically well back forty fifty years ago people would talk about that and they'd say how much should you regularize it was this tuning parameter now we have all sorts of ways of estimating the tuning parameter we don't think about it but it used to be considered like almost like a personal preference like I like it smooth I like it rough like like that kind of thing what's another one model checking so back in 1991 I went to a Bayesian conference conference just full of Bayesian and that's all they were and and and I was going up like looking to talk the posters people fit these models and I'd be like does your model fit the data like could I see like I remember one case in particular someone's poster and it was a model of the circadian rhythms and when you sleep and there were I think not.we circadian rhythms is not when you sleep as a model of sleep and I think they had your people at like four stages of sleep and they were doing something where they're measuring people in the middle of night I don't remember the details but you have this time series jumping up and down they're the statistical model and they fit it and I was thinking hey could you create some simulated replicated data from your model like why why should you do that I just want to see if your model fits a date right no no you don't understand like we can't check whether the model fits the data you're not allowed to do that why not these are because our models are subjective you so they're uncheck able I just thought make sense because they're subjective you shouldn't want to check them why are you telling me your model while you're talking to me at all right let me just go back into your cave you know but the issue was that their their philosophical their theoretical framework was too narrow so they were you know they were some help here and there's so over the years we've developed we've incorporated model checking into Bayesian data analysis to now like you could say well you don't have to feel bad about model checking if you're amazing but it's actually more than that that we can check our models more effectively so it's kind of funny when you look at when you look at book like two Keys exploratory data analysis from 1977 1977 or his paper from 1970 that it was based on he had a lot about graphics but nothing about models the Tukey was an old time modeler he invented the fast fourier transform but he didn't like the models he was using the serve his Irv kicked the ladder out from under us approach to statistics which is you can use the models to create these graphical ideas and then you should just look at graphs not do any model well what we've learned since then is that if our models are more advanced we can actually do more effective graphics and so that's another exact another example would be missing data so traditionally it's like missing data is like a pain in the ass right it's like like oh I'm missing data let's do something about that but maybe we should model the missing data process that you can do better than that and that's you know in the 70's and the 80's 90's a lot of lot of work was done in that area also in particular applications survey non-response I still deal with people who do work in the survey sampling industry who don't like to think about survey non-response well it's serving survey good survey might have a tell good telephone survey might have a nine percent response rate so those of you who are sort of good at math in your head will realize that that's a 91% non-response rate kind of an issue so we can handle that so like it when we do mr. P that non responses is right there it really makes a difference in real um real surveys I don't know if any of you saw there there is an article in early October in The New York Times about whether they said they took they did a survey in Florida and they sent the data to four different pollsters and had them estimate the the polls the standing of Trump and Clinton in the election and different adjustment methods gave different results but the group that used Mr P happen to say that Trump was ahead by one percentage point that's just one case of course but the point is that these in the idea of taking the idea of taking something that is was considered outside the realm of statistics and putting it into statistics I think is incredibly valuable it's and it's may be hard for you to believe but if you go back to the old literature missing data was not really treated in a systematic way it was considered something to look away from and not not think about and we'll still see this in a lot of experimental scientists like in medicine biology psychology you see published papers and the missing data are wished away they just you just have the table in it like people get dropped out and they don't even tell you why they're not there because they don't have a way of handling it and they don't have a way of talking about it they know they're not supposed to so I think there have we verse like I love it well just for what it is but also as an exemplar of this idea that we can think we can think and act systematically about things that were considered to be merely sort of matters of of grooming um as it were so I want to talk a little bit about workflow I think workflow is very important to all of us I think this is this is a room of people who who do work on who do who analyze data and I would like for our work flow to be more formalized so I would like the same way that like partial pooling got integrated into Bayesian hierarchical modeling the same way that model checking became formalized or missing data analysis similarly I think there are many steps in our workflow which now are considered good workflow but yet are not formalized and I would I would like us to be able I would like us to be able to do that so of course as I said some of this that we've done like model checking so we have a sort of proto workflow and the proto workflow is you build a model this is you build a model you do inference you check the model using predictive simulation then you improve the model now that's not new I mean that was in the that's in page one of first edition of Bayesian data analysis 1995 so it's not that that part isn't that new but but it's missing a bunch of steps so some of the steps are are conceptually not such a challenge like being able to fit bigger models scalability what I spend most of my time doing it's scalable inference but the model checking part there's some open questions like what graphs are you supposed to look at often it helps to fix the mind to think of an artificial intelligence trying to solve the problem like so I I somehow imagine an AI that's doing Bayesian data analysis and fits a model and then decides it wants to check the model so what it does is it it I'm right you know it goes into R and it writes a program and makes a graph it prints out the graph then it has like a giant robot arm that carries the graph onto like a scanner it scans to get graph in and then it puts it into sort of a robot retina which then reads it and then looks for patterns which is kind of ridiculous except that's what we do right it's not so ridiculous but maybe there's a more efficient way of doing it least you don't have to print it out and color printers always running out of ink and all that so part of it is how do we we systematize that the idea of model expansion model building so that's always mysterious right so in statistics in statistics textbooks are you standing here so move around in a certificate textbook it's like how does it work well there's there's this experiment that gets done like I have a treatment and so how does some smart scientist oh now I can't work some smart scientist comes up with a treatment it's compared to the control and then they do a randomized experiment and then it's statistically significantly better and so lives are saved end of chapter 4 right that's how it goes well sort of like where did the theory where that scientist come up with the idea well you know maybe using some other statistics it's not there there's a problem though I wrote a paper recently called null hypothesis significance testing is incompatible with incrementalism which was my personal attempt at like the worst possible title of like I had this idea that if the title is bad bad it's good like it's so dear like null hypothesis significance testing it's incompatible with incrementalism you know what does that mean it's like it's actually good like like those graphs that are so hard to read that people like them because you have to like figure them out and and so but my point was that in a lot of the world like our treatment effects are incremental so in medicine it's kind of required that your treatment effects be incremental because you're supposed to compare the treatment with the best available alternative you know not like okay we're either going to do this great drug or a sugar pill now it supposed to be the great drug versus the current status quotes probably an incremental small incremental improvement but when the treatment effects are small then you're going to have your standard error it's going to be big compared to your estimated treatment effect the statistical significance filter says your estimates are going to be too large this is what happened with that study in Jamaica the early childhood intervention the ADIZ effects are Moll but the data are very noisy and of course was something like early childhood intervention you can't do between a within subject design very noisy so the effects are going to be overestimated while overestimating the effect size maybe that's good because the early childhood intervention is a good thing maybe we should be overestimating the effect sizes but then I think we should be honest about it but it's a funny thing because that estimate of 40% that I told you about and I'm picking on this study not because it's bad because it's good right it would be easy picking on a bad study is easy I don't do that sort of thing I don't talk about bad research that's boring right what I want to talk about is the good stuff top researchers doing a serious study but it's hard problem right so what are we going to deal with that how we going to study this uhm it's an unbiased estimate right they're taking the mean forget about missing data and stuff like that you know they're running your regression whatever they have a coefficient it's not an unbiased estimate because remember the theory of theoretical statistics is the theory of applied statistics you have to study what we actually do so only things and so it's it's if the standard error is 15% and the estimate has to be at least 30% so either plus 30 or higher or minus 30 or Meyer so the absolute value of the suppose the true effect sizes say 3% then the the estimate is guaranteed to be 10 times bigger than the truth that's kind of that's kind of horrible it's not unbiased at all it's really really really really really biased inspires by a whole order of magnitude okay now and you were outside of astronomy an order of magnitude really means something named so so yeah well that's too bad well so what do we have we have a bunch of values that don't actually represent the probability of seeing something more extreme than what you observe if the null hypothesis were true and you also have a bunch of confidence intervals that are actually not correct ninety-five percent of the time we have a bunch of unbiased estimates supposedly unbiased estimates which on average are biased they're too large so obviously our theory has has some gaps in our theories have some gaps but my Bayesian approaches have some gaps - they said where did the models come from so I have an idea so the idea it's not this is not a unique idea to me the idea is that there's model space so you your perhaps used to some of you are used to thinking of graphical models of thinking of variables in the space connected by nodes but this is not what I'm going to talk about now I'm going to talk about a space of models so that it's a space in which each model is a node and they're corrected by connected by edges so I have a model which is like you know I compare I take the average I can take the difference between the treatment mean minus the control mean and then I have something where I can control for pretest so that's connected then I have another thing where I control for groups and I have a multi-level model for that that's another model where I could have the groups and the pretest or I have the group's interacted with the pretest or I have a measurement error model if it's a pretest dispenser with air or I could or the treatment could be measure over there I'll give a drop it every time I add another twist it's like I'm connecting it to the model and putting it together so model building is it's like language I mean it really is like like Stan is a language like what that means is that it's not there's not a menu of models it's not like there are eight models we can fit okay just like when we speak there's not we're not like one of those characters like from the totalitarian government who can only say can only speak one of eighteen sentences and each time he talks to you it uses one of the sentences you have to figure out what he means by the context is not like that okay what it's like modeling is like I'm speaking here I'm saying a sentence that's never been said before amazingly enough even though I'm putting together just a bunch of simple words and we put it together so model building right modeling model building requires a sort of calculus of models it requires this idea that models are connected to each other no I think that's a bunch of talk you know we need the computer program that does it to like a computer programming is basically the last bastion of rigor in the modern world and that's I don't trust anything until it runs on in fact until it runs twice on the computer I mean it's funny I'll tell you when I was in college I took a class was called introduction to design where we built we had to build a machine we spent there's lots of beginning part of the class which was kind of boring but at the end we basically spent I basically spent two weeks without and in them in the machine shop except like it was was only open 14 hours a day so they get in there at 8:00 in the morning and go until 10:00 and I built we're building our little machines and one thing there's a couple things you learn I'm one is that the laws of physics are immutable and you know you try to build like a trigger and if you don't have a long enough lever arm it's just not going to trigger things like that but also something can work five times and then break which is kind of funny because when you built when you write a computer program if it runs it runs well maybe not anymore because if you're right for somebody else it can the environment can change but in some sense like a machine really like if you put too much stress it will it will break and so that that's another world entirely it so I would like you all to remember that you spend most of your time in the computer but the real world is you know has physical world has another aspect to it so I think we need a theory of models now this connects to model building so if you had an AI building a model the idea is it would sort of walk along these just as if you wanted an AI to speak it wouldn't choose sentences out of a basket of sentences it would try to imitate how we have we form our sentences how we have our thoughts then in model checking similarly there must be a way to do this I'm the one thing that I find is it is a challenge an open question about this network of models is how do you relate them to each other because I don't want to just say here's them all here sum all here's a model and like I'm going to compare them based on which has minimum wake or which has minimum cross validated error lieu or whatever that's not really the point and I can say this in two ways one there's just a lot of models out there there's more models you can possibly compare that way another way of saying it is that suppose that I knew the correct model I mean either the correct the true data generating process or like the model that I ultimately want I know that I want it even then I would like other models I would like a model that simpler that's a little simpler because that I to show that I to show that the model without this feature wouldn't work right so I need a mouth too simple that shows why I need to do what they did and then I need a model that's a little more complicated to show that yeah I could have added more to the model but it wouldn't have changed anything so at the very least I need to do that well how do I compare mouth I don't want to just compare there there there Lu you and there leave one out you no air or anything like that I think usually when we compare models we compare them at the parameter level so this is way that you have models in model space that are connected by nodes when they're similar to each other but then there's an operation which is to compare models and then you're connecting like if the model itself is like a little graph you're comparing the parameters of the model to each other so this is complicated so we need a theory for that right so like again like this is not the you know the Hadley verse is not going to solve this particular problem but I mean that's fine right like I am probably either the point is that the Hadley verse is to being clean with your data as this hypothetical you know non exists in our package or whatever it's called theory for for model integration is to this problem of comparing models building models there's a lot that we do that we don't have have theory for um so what is theory I mean just theory of any value at all just math have any value at all I mean little math is good right like for example hey I have my problem and I fitted on a thousand data points and it took two minutes what if I 10,000 data points would it take 20 minutes 200 minutes will depends on scales obviously a little map is going to be be helpful here I can give you another example a few years ago I was when colleague and I were forecasting presidential election was 1992 actually we we weren't really interested in forecasting the election we're interested in demonstrating the election could be forecast using fundamentals not using polls but just using economic conditions so we did this state-by-state forecast and we as a by-product week computed things like what's the probability the election is exactly tied so it goes into the the House of Representatives what's the probability that your vote is decisive so that the voting your state is tied and your state is required for the national vote to net you for the for the electoral college for your state to matter so to do that like we use so you there have been literature on this which is purely theory based like binomial distribution it's ridiculous stuff there's somewhere I saw someone they had it they had there was some some paper and it said the probability of the election being tied is one in ten to the ninety-two and that's I mean well of course that's ridiculous because the election sometimes within a thousand votes right so like it it somehow can't be much less than like one in a million or one I mean one in ten to the ninety-two it's just the wrong order of things it's it's a numerous but that was a pure Theory approach so even to a pure empirical approach we have a forecasting you do simulations but you would need to do like a billion simulations in order to figure out like what's the probability of the tie so we use the little theory you can figure out the probability it's within a thousand votes and then divide that by a thousand and there's some smoothness so that worked so this gives me some insight what is theory what is their difference theory is scalable right simulations are not scalable but theory is scalable and that's that's what it gets us that's the theory of theories but it's not always presented that way in the textbook so why do you need to learn the central limit theorem or whatever you need to learn it because it's it's scalable and you can't figure out what's going to happen so hopefully you know with a small sample you do the simulation when the large sample the theory will work of course you need theory of the approximation and so forth is there's a lot of room for meta theory also I'm I this came up recently weighted discussion in the blog about whether programming should be a prerequisite to learning statistics and I think some people correctly said no it shouldn't be a prerequisite really like programming is kind of hard statistics is kind of easy and so like you shouldn't like for example making you know you should be able like the the graphs that were made in the previous talk were made using a lot of programming however if someone wants to be a statistician or to use statistics he or she should be able to learn how to make graphs like that without having the program like you're using using using a menu or whatever certainly we should be able to interpret such graphs without programming so they're I agree but there's a saying so in the 20th century I'm still not completely used to saying that like in the past than that but in the 20th century it was said that if you want it to be a statistician you had to be a little bit of a mathematician whether you want it to be or not so I think maybe you can see where this is going that in the 21st century if you want to be a statistician you have to be a bit of a program or whether you want to or not so programming should not be required prerequisite you should be able to take introductory statistics and sort of get your way through it but it should certainly be no more it should be no less of a prerequisite than mathematics I think that if you want to make the argument that you can do statistics without any programming at all I will believe that I accept that argument if you now want to make the argument that you should be able to do certain amount of statistics without any math like not even algebra forget about capitals but not not knowing how to solve for X then I would say yes you could of course it'll help if you know how to program you'll be a better programmer you'll be better statistician if you know how to program and if you know how to do a little math and you can like solve algebraic equations for axes if you don't know any math that's going to be hard to understand these 1 over square root of N Things but if you can program you know you can get another way programming is more important than that so it's getting but we have we have to sort of get used to get used to understanding that so I I would just like to conclude on by saying that workflow is is such an important part of our work lives I mean you know each of you must I can't imagine anyone would disagree with me when you say that not only do you spend a lot of time sort of on your workflow but I think each of us spends a lot of time thinking about our workflow but you actually spend a lot of time analyzing yourself thinking what will work what what doesn't work and workflow is an app to the extent that we do statistics as I'm sure some of us do or data science in this room much much of workflow is still not inside the tent of statistical theory and I think it could be I think we should be heading that way and I think in the meantime we should be thankful for how much current statistical theory methods does include we should be thankful for the hadley verse which would be thankful for languages such as dan that incorporate model building as part of the as part of the process and I think we should all with each of us when we are doing our work we should look at what we're doing and we should think about our workflow in an abstract way we should think about just there's a principle in programming that you should never type the same line of code twice right and similarly like if you're doing something over and over you try to systematize it try to systematize your systematizing try to think about what you're doing and create principles be an amateur statistical theorist because we the statistical theorists have been caught up to you yet and I think you can be you can do a lot just as if you imagine like a proof someone doing a sample survey and the era before random sampling could come up a lot of good ideas by recognizing that the challenges that they face so you should do that you should respect your own challenges and and try to come up with a theory of yourself um rather than waiting for us to do it for you you
Info
Channel: Work-Bench
Views: 24,039
Rating: undefined out of 5
Keywords: analytics, data science, big data, r conference
Id: cuE9eHSbjNI
Channel Id: undefined
Length: 39min 6sec (2346 seconds)
Published: Tue May 16 2017
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.