Jan Vitek - Getting everything wrong without doing anything right!

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

The repetition study is here https://arxiv.org/abs/1901.10220

👍︎︎ 5 👤︎︎ u/altgamer1911 📅︎︎ Jul 19 2019 🗫︎ replies

This should be at the top of /r/programming

👍︎︎ 5 👤︎︎ u/sexcrazydwarf 📅︎︎ Jul 20 2019 🗫︎ replies

It sounds like Philip Wadler is in the audience and gets into an argument with someone else in the audience who is mostly inaudible. Can anyone make out what the inaudible guy is saying?

Where Wadler asks his question. The response shows up soon after.

👍︎︎ 3 👤︎︎ u/Condex 📅︎︎ Jul 19 2019 🗫︎ replies

Damn. That's a lot of work he went through. But good stuff.

👍︎︎ 2 👤︎︎ u/victotronics 📅︎︎ Jul 19 2019 🗫︎ replies

Awesome talk, bad audio quality.

Here's the link to the original threads about the study dissected in this talk: https://old.reddit.com/r/programming/comments/732fes/a_largescale_study_of_programming_languages_and/ and https://news.ycombinator.com/item?id=8558740

👍︎︎ 2 👤︎︎ u/RockstarArtisan 📅︎︎ Jul 19 2019 🗫︎ replies

Abstract says:

>only four languages are found to have a statistically significant association with defects, and even for those the effect size is exceedingly small.

This shows bug patterns better than any table I have seen. It show the value of combining QA methods to catch mistakes at every step.

https://docs.google.com/spreadsheets/d/1h1bpuggseVZ65KiuPdNDrnvomfH5-lXHBMiCyyr4mRk/edit?usp=sharing

👍︎︎ 1 👤︎︎ u/eddyparkinson 📅︎︎ Jul 20 2019 🗫︎ replies

Captions

hello everybody I'm young VTech and it to be and to be able to tell you about this work the title of the work is is an homage to wait is it doesn't work the title of the work is an homage to a paper by Mick Cole mitko it's Andy one Sweeney and hauswirth called producing wrong data without doing anything obviously wrong and if you haven't read that paper you should this is like one of the best papers in computer science it makes you laugh it makes you cry so we'll try to do the same here all right and I should say this is work that was done with students and collaborators over a period of about six months so these the last few years have been very good for programming languages if there are so many of them that are in use they're varied they have interesting features they are exciting and each of us have our preferred one right we all have our our preferences and if we are given the choice to pick a language for a project probably most of us would pick different languages the problem with choosing a language for a large project with many developers is to paraphrase someone once you've made that choice you kind of stuck with it right you have a pile of legacy code then it's hard to you know back get back from you know that choice and so the question then is what is the right language how do we pick I working up in the programming language community and unfortunately we have very little good advice to give you so what is productivity programmer productivity and do various programming languages make you more productive the answer is I don't know I have my preferences esthetical choices of mine that will drive me towards one language or another but you know as a community evaluating programming languages is mostly a failure it's not even a failure we don't do it I mean a failure means we tried and didn't succeed right we introduced new languages and new paradigms without a shred of scientific evidence to back those choices that's typically not considered a good thing what can we do in our community well there's one thing we know how to do and that is take an mostly unrepresentative set of benchmark programs and see tell you how fast you know this languages computes Mandelbrot and whether Mandelbrot is faster in that language or this language that's what we know how to do if we wanted to actually measure the productivity of language we don't have you know really much of a clue what to measure how to measure it so let's imagine what would be a good experiment if I wanted to compare say you know C++ and Haskell I would get teams of programmers multiple teams with various levels of expertise I would give them the same task I would measure all the attributes of the development now it repeats that tens of times and eventually I would be able to say well the projects in Haskell clearly were better because we measured this this and that and I'm not saying what we you what because I actually don't know so so this is this is really a pretty sad state of affairs and so there are some communities that have tried to answer that question so the experiment I was describing with large teams that you repeat the same problem is obviously not realistic so what can we do well if we can't we if we can't run the right experiment we may try to see if there's data out there and you know maybe from that data we could infer the information we want to get and one potential source of data is github right there's lots of data there there's you know histories of projects written in multiple languages over years by various teams of developers so can we from github data get back to that question of what is a good language how could we do it so this talk is about a paper written by this team of researchers from UC Davis at the time and that paper was called a large-scale study study of programming languages and code quality on github and the goal of that paper was to reserve answer for research questions I'll focus just on the first one for this talk the first research question says are some languages more defect prone than others using your language language a will I get more bugs than using language B okay so that seems to be getting at what we want to know right evaluating languages so what do they have to work with well they have github and in github you have this stream of comets that describes changes to code on one or more files written in one or more languages right and they're you know they have information such as what was the change who did it what time so on and so on it's not right so how did they go about trying to answer this question they followed the following methodology so the first thing they did was pick languages so they stroll the used ranking of the most widely used languages they picked 17 and for those 17 languages they try to find the 50 most starred project so on github if you like a project you can give it a star the more star probably the more interesting the project in some axis so they had 800 projects and they got all of the commits so that was million and a half commits and each project may be written in multiple languages so they split the histories according to the languages so for each project if there are two languages they'll split that in two streams and then they throw weight throw away projects that have too few commits because they are probably not relevant and then for each commit they look at the text of the commit and they try to determine is this commit fixing a bug if it's fixing a bug we'll call it a bug fixing commit and the idea is that align a project that are more more bug fixing commits add more bugs how do they label these they use you know techniques that were written about in software engineering literature basically looking at keywords and trying to find things that suggest this is a bug fix then they apply a negative binomial regression now draw a line across the languages and if you're above the line you are have more bugs if you're below the line you have fewer bugs so here is the main result from the paper and the way to read it is you have control parameters and then your 17 languages and for each language there is a coefficient either positive or negative saying more bugs fewer bugs and then there is a p-value that says is this statistically significant so anything that is under 0.5 is considered statistically significant things that have a - are not so to represent this data visually let's look at the other side of the slide and you can see that there are four languages that I put on top thinking you know because they're better languages they're closer to heaven right and so these are Scala Haskell typescript enclosure then there is a bunch of new languages that are in the middle and the color difference represents languages for which the data was not statistically significant and then there are the bad languages at the bottom your PHP is your C's your C++ --is and your objective C's right this makes perfect sense right this is what we've been telling our students well some of us are being telling us I see some heads nodding in dissent but let's say you know there is part of the community that would recognize this as look statically typed functional good imperative weakly typed bad everything else may right okay so that's what they did so I read this paper when it was republished and there was one sentence in the paper that kept bugging at him just like this each that you have right and the sentence said a single project Google's v8 a JavaScript project was responsible for all the errors in the middle worker Gouri so I know that v8 is a virtual machine for running JavaScript but it is written in C++ so that kept bugging me an FF that that feels wrong and if they get that wrong what else so eventually we asked the authors to give us their code and their data and we said let's have a look you know what what does the data look like and in the category of let's have a look there are several things you can do so this is the sort of the scale of reproducibility at the one end it's you have a paper and results and what can you believe well you can believe the authors and that's it and then you can do what is called a repetition what's this is what the artifact evaluation committee does in our conferences basically a repetition is to take the authors code and their data and you just rerun the thing and you should get the same result that sounds very weak but you know it's better than nothing and then you can do a reanalysis where you take their data and code but you tweak the methods see how robust the results are or you can do a full independent reproduction where you just redo the follow the whole thing from first principles so the first thing we decided to do was a repetition because that's easy right so we start with a repetition so we have the data from the authors it's about 3.4 gigabytes and 77 hundred lines of our code and I'll just focus on the first research question so we were mostly able to reproduce the first research question so this is on the left there's the original table on the right is the repetition you see that arrow points that the fact that when we run their code on their data we don't get the same result for one of the languages it's not statistically significant anymore or no it's not the numbers are different okay so that's odd but but but but yeah that's fine we'll call this as a successful repetition we're not able to repeat the other questions for various issues such as missing code and data seem to have rotten but okay so focusing on the first question so we said okay so we can repeat it let's try to reanalyze it because I was still unsure so the next step was reanalysis so what does reanalysis mean well basically any analysis pipeline looks like this there is the real world there is some process to acquire data from the real world then you have some process that cleans the data you know formats it then you have static listicle analyses and then you you get a conclusion so reanalysis simply takes each step and say well can we get the same data from the real world can we validate that the cleaning process was correct can we validate the statistical methods do we still get the same conclusion simple right so that was actually about five months of work in in the end right for three people if so so what did we find well we found a number of things and the high level bit is know the conclusion doesn't hold but the interesting thing is why and I will for this point for this talk I'll summarizes it in seven points which we could think of as seven sins of the data analysis and I'll just walk you through these points so sin one so this is from the paper we have the languages and there we have the list for each language of the three largest project they had in that language okay fair enough so foresee you have Linux which clocks at 17 million lines of code decent-sized real system for C++ WebKit or some version of WebKit clocked at 3 million lines of code not bad there is a Haskell program pen doc that had 20,000 lines of code still of a real program Scala had 61 thousand lines of code and then rails - dev box had 16 lines of code now the point here is you are going to make conclusions on projects that range from 3 million to 16 lines of code and somehow they're all treated the same they have the same weight it's unlikely to be good here's a graph that gives you the same thing in another representation so this is on one axis the commits in log scale and the other axis is the number of bugs and you see I drawn a line through through this and you can see that most languages are kind of close to that that line but some languages are much fewer commits and C for instance has plenty of them so you're comparing very very different beasts right so let's first sin then I'll just mention this quickly there's a lot of uncontrolled effects and like this graph shows that scruff is kind of cute it shows the lifetime of a project so these are months short lines mean this is a project that is younger longer lines meets the project has a longer lifetime and the blue line shows the trend of the proportion of bugs so if you look at Linux for instance the line is flat it has about the same a portion of bugs per month from one month to the next over the Comets but some projects over time decrease the number of box other increase them why I don't know but if we aggregate all of this together it smooshes a lot of effects that certainly you know there is information there as a side note if you look at Linux the last few months had way way fewer bugs and that looks like a weird outlier right okay so here's another one so the third same table with the main biggest project and let me just point out that so this is WebKit this is Bitcoin this is WebKit again and this is Bitcoin again so we are what is going on well we have different variants of the same project that are being included in the study so we went and we looked at one project to the next which project had some commits in common that means they were somehow copied from one another and we found that the first 17 are variants of Bitcoin right so what happened well Bitcoin is highly starred and people like to replicate it so very likely we're double counting dogs now this could be fine or it could be bad but it's certainly not something you can just ignore that's only 2% of the data but you know maybe that 2% was anomalous who knows so the next bit was a lot of work so the data we received had the name of projects but didn't have their owners and what we wanted to do was validate that the data what was the real-world data so we wanted to take the project from github and compare the commits that they use with the commits on github so we started with 729 project that was the project they had they had kept in the study we found that only 618 project with the same name we're on github so that means there's like a hundred project that may have been deleted or become private out of those 618 we could only match unambiguously 400 of them that means there were lots of projects with the same name of different owners and we didn't know which one they had used so we had 400 projects and we found that a hundred thousand commits were just missing in the data they used so there is twenty percent of the real-world data wasn't used and for most languages so this this axis is the percentage of missing commits for most languages it's small but if you go all the way to Perl 80% of the Perl data was missing right so you're saying you have a language and you've thrown away 80% of the commits and you're comparing that with other languages if we come throw away commits randomly we don't know they're just not there they should be since five so they had data from 2012 going all the back to 2002 I believe and the first comment for typescript was dated 2003 ok yeah any any any issues with that okay so how is that possible right time machine leaks from Microsoft before the developer had started planning the work actually how did they tell that a file is typescript well if they told it because it was a dot TS extension what else is a dot TS extensions it's a transfer patient file translation file are using multilingual software usually translation files are not buggy this is why typescript appear all the way at the top of the best languages because most of its data was not code okay and oops sorry we just go back to this and the largest project when we remove the translation file the writers project that remain are three projects that contain only type definitions so again no code so the whole typescript dataset is useless so I told you that what got me started was this comment about v8 being tagged jet type does the larger JavaScript project so indeed when we looked at their code air data v8 is the larger JavaScript project it has 3000 commits to JavaScript files 400 to python 7 to c++ and 16 to see you say we this is weird so we go and we try to understand what these are it turns out the JavaScript commits are almost all tests so this was a case of I have a bug in my C++ code and I'm a good programmer so I'm going to commit a JavaScript rip-rip row for that board makes sense now these bugs were us all given to JavaScript probably they should have been associated with C++ but how can we have so few C++ file well it turns out that extensions capital C little CC capital c TP c++ c pc x xn dot h were not included in the study so no header files for the c code none of these the only files that were under in c++ were dot cpp so that's about I think 15% of this the c-plus but I forget you know 16% oh no no that's I forget how much of the C++ code they actually got but clearly not all alright so here we've gone through the steps of data acquisition that are cleaning data preparation so now we're going to the actual problems statistics so I told you that they were using p-values who has used p-values before yeah you shouldn't I'll tell you why they're bad but let's assume we like p-values so the pivot a p-value is going to tell you something about the hypothesis you're making if it's statistically significant now we actually don't have a hypothesis we have 17 hypotheses because one per language if we use a single p-value we will get what is called this family-wise error there will be more mistakes and this is well known so there's a paper in XC where these authors published which has said this is one of the most widely committed errors in software engineering literature so this is not you know not a surprise and there's two ways to to to correct for this error one is called bonferroni and it is simple you just divide the p-value your your your your what you get by the number of hypotheses so you divide it by 16 because I've removed typescript and that's very simple and the other one is called the false discovery rate is it's a little bit more complex but there's a citation if you want to use it it works so so and I'll come back to these the what this means in a bit but basically fewer languages are now statistically significant and the last scene that I want to inflict on you is labeling so this is a study about bodies right and the whole point is I have to have a commit that is fixing a bug so how do I know that a commit is fixing a bug well according to the scheme that the authors used the first commit here is a bug because there is Fault in default that is what that system returns the second which says fixed commands is Abad because we're fixing something the third one which says clauses number one five three is not a bug because clause is not a keyword so the interestingly there is a citation at the bottom from a 2009 paper in the software engineering community that said don't do this don't use key keywords and it was written by two of the authors of this study so what we did is we took 400 400 of the commits we took 15 real-world developers and we asked each developer to label each we asked developers to label each commit so we had three developers labeling each commit and then this who allowed us to estimate how far you know this approach is from the truth and what we found from that study is that 36 percent of the body the cold birds were not bugs an 11 percent of things they said we're not bugs were actually bugs that's a big difference right so now suddenly you're working with data that is fuzzy right you can't trust those numbers you're getting for each project because any project may be off in some way depending just how how people write their commit messages so to sum up so all the way here is on the left is the original data all the way on the right is the result of using bootstrap we're basically we're sort of simulating what could be possible you know we assume there's a 36% false positive and we simulate the data that would be hey if we were 36 percent of the time we're wrong what what could would we get and while I'm going to show you is out of the 13 so they had 17 languages originally 13 were statistically significant what is left once we do the D all of the steps that I've described so see isn't statistically significant anymore Objective C isn't statistically significant anymore javascript isn't anymore typescript we've just got rid of it because there wasn't anything there PHP is not statistically significant Python isn't statistically significant stylize and statistically significant Perl 80% is missing so even if it is that a significant we'll just remove it and then in yellow I have all the languages that were already not statistically significant so you know we're left with four does that still paint a picture that has any anything to say well the worst language is C++ all right fine let's say and the best language is closure followed by Haskell and followed by will be Ruby okay perhaps maybe odd so at the beginning at the outset I've told you that p-values should not be used well they shouldn't don't you know I'm not a statistician I married one so I have this is just you know sort of knowledge that I acquired on the way but the statistical community and I have some references here on this slide have recognized that p-values are not useful and especially when you have large data so as your data grows it's much and much easier to have you know statistically significant p-values so p-values are are unusable and if you have doing large scale anything definitely meaningless so what can you do well you can try to estimate the practical significance of what you've observed and here this graph Maps a simulation where we take the error rate of the two languages the one that has the worst behavior and the best C++ and culture and what the graph shows is the pretty third bugs for a particular number of comets and what you'll see in this graph is that both of these languages mostly overlap so you couldn't tell if I just gave you a number of commits and a number of bug you couldn't tell if this is a language written in C++ or closure there is nothing here right there's no information all right so back to our paper so this is the FSC 2014 paper that we've been talking about and this was the well-received it was well received enough that it was chosen as a research highlight and published after a second round of reviews on in C ACM and that was in 2017 and then you know the conversation starts so you have your social media and they're saying this and they saying the shocking secret about static types and they're saying less cold less bored with functional programming they're saying statically typed functional languages when and they are saying boom I don't have anything against all those people they may be right what I have my problem is that what was presented was at best an association right and the interesting thing is the authors were careful in the statement of their results they wrote some languages have a greater association with defects than others although the effect is small right so they were careful yet everybody who read this paper jumped to the conclusion that you know this causes that and this is I think maybe one of the biggest message of this talk is a scientist when we write stuff we should be careful to not lead people to make to those conclusions and other readers we should be careful not to jump to them so we did a last thing we did a citation analysis so what we did was we took 120 papers that cited the original work both from other authors and and and the authors of the paper and we counted so 77 of them had just cursory references to the work so this is the kind of of citation when you have ten citations in a row and this was one of them okay then twelve people talked about methods so they said all the DFAC paper used that kind of github analysis and so on then two paper cited the results of the study but used Association correlation language they said yes they found an association and 24 of them used causal language and also three papers by the authors who had been careful when they wrote the original work to say the right things but then God maybe carried a you know sort of enthusiasm took over all right so we're almost done so so we did this and we decided to write it up because you know that was come some amount of work and we put the thing on archive last year and well they were positive comments from friends in England we were called boffins which I had to check what whether it was an insult or positive it's positive scientists I'm a buff in now so that was good but we also tried to submit it to conferences right because we want you know the community to know so what happened well we submitted to FSC where they had the original paper and the rejections were for what reasons I think I have oh the first one is why do you use bonferroni and not FDR okay so we used in the revision we used both and it doesn't make a difference then your paper doesn't have code and data attached to it well it was double-blind we were not supposed to attach but fine okay so we of course we have code and data and then what was the third thing they didn't like oh this is a multi-line so they said the bootstrapping method is clever but it relies on limited but later linked data and they suggest that we have more people rate the commits which we did in the revision we took 15 real-world developers so you know there are not horrible comment by any mean I mean all of this is true will be disappointed because if somebody came to my work and said hey you're an idiot you published crap I would say okay tell me how and I'll try to fix it so then we tried XE the other conference in that community and there what did we get the reanalysis actually confirm the original conclusion okay I guess we had numbers they had numbers essentially the same results given some differences in operation operationalization okay yo given the fact they were wrong yeah they were fine and the last one is this pop paper appears politically motivated I'm such an a political guy I wouldn't be controversial why do you know did they think that I don't know so the status of of our work is it got accepted to topless which is nice so it will at least appear I don't think many people in the software engineering community will read it and I'm guaranteeing you that the number of citation will be one order of magnitude than the original less than the original or at least it's out there the truth is out there so voila and you know there's an artifact you can redo anything we did the paper is fully reproducible thank you very much did you at least did you at least get a thank-you from the original authors so the question is did I hear from the original authors so yeah so when the paper appeared on our site they contacted us and they complained about one thing in that article because Emory said that had a joke about if you torture data enough it will confess and they said you have to retract that from the register article and we weren't you know I why and they said then okay then we will talk to you once your paper is accepted that a peer-reviewed publication first of all things of course and second that this is a large-scale research it's kind of scary and he did so flawless of fawful but I was wondering about the definition of the back here was that it was annotated in some way you did it a bit of different way but it only proves how often books got fixed but not really how often works appear at languages so maybe if you just do some math on the issues created and how quickly they get fixed yeah I get something I know where you're going so there's two things you said the first thing you said is it's based on large data it's scary I guess that it's wrong but I would claim that the more data the more likely you get you get bad answers because like all of these mistakes took us time to find out we had really to understand the data to subtraction so so I would I would submit that I trust large-scale studies much less than small ones the second thing is all of this that I described are small details the fundamental here is I don't think that with raw github data you will ever get that answer it's so much more going on like a project that has many customers will have more bug reports and therefore more bug fixes there's so many things the steam it's just not to answer that but if we had written a letter saying hey this is not possible you know your paper doesn't you can't answer that nobody would have paid attention if we can show the paper is wrong on their grounds using their metrics then you know in the discussion section we can write oh and by the way there's many more reasons why you shouldn't do this right so I have two questions question number one is given that it's as you know as soon as the paper is published it's pretty hard to refute it service paid you know which do you think it should be you know you know the reviewing process and everything should we make make sure make it easier to you know review the already published paper so this is question number one and question number two is you said that you know even as a scientist you have to write stuff carefully so that people don't jump to conclusions problem is that people will jump to conclusions anyway so what do you think should be done in you know such a case you know prevent people from actually jumping to conclusions so your first crush so let me take in in reverse order so how to prevent people from jumping to conclusion I don't know you guys voted for brexit we voted for Trump over here so you know the population will make some amount of mistake from time to time I don't there's nothing we can do all we can do is as scientists is write things clearly and avoid clickbait all right so and what was the first question was a well the first question that was given that it's hard you know you see a paper that's what we assess the reviewing aspect yeah yeah so the reviewing aspect so there is no way I mean I I don't fault the X the FSC reviewers for missing all of the things I showed you because they're hidden in the data in the the mass they're not in the paper right so there's no way the reviewers could have caught them well they could have caught is like this even if you do this you're not going to get the answer but it's hard to argue so I think no this paper should have been accepted what we're our community fails is we don't value reproduction studies because you know normally the syntactic standard is somebody say you know the moon is made out of cheese okay maybe I mean from afar could be and then somebody else says yeah yeah I can confirm that I did the equation you know or says no this is nonsense that's the normal way right reviewers they have so little time such a small window to judge they're mostly saying is it plausible you know is it plausible and is it interesting and there that's the other problem is reviewers tend to look at shiny things right so if you are very careful about your claims if that paper had been very careful about its claim then probably not have been published right so they want something that sounds noteworthy if you just say oh you know sometimes I have more bugs with some languages but really we can't conclude anything and I would never have passed right yeah so this is very this is lovely I'm glad you very happy you did this but I do have some objections first of all I object your last statement that this paper should have been accepted I said she should have been should I object to that okay I think that this would be a wrong conclusion so you're saying there is a way to even do this right no no no way vacations in 17 languages so yeah okay I think that paper should have been rejected on premise anything that that tries to draw these over general conclusions when there are such evident biases to begin with I think as a community we should be favoring very clear small questions if you were to tell me I will do an experiment that would be Haskell versus C++ with a lot of experiment subjects then I might start paying attention it's a very clear question at least but I don't say I will not agree that this paper in principle should have been accept but I do agree that the way our reviewing processes work he salts that this looks like a great paper yeah so so I agree with you on the fundamentals what I meant was all of the mistakes we found I don't think reviewers could have found by reading the paper right so like the more you know fundamentals part of is this idea this research this set of research and the methodology even sound yeah I agree with you this is not sound I don't think they can ever get the right answer but the mistakes I'm mentioning you know you couldn't have seen without doing the work we did so do you think that one can do this right do you that particular study now decide on direction yes can someone do a statistical study and answer the question of whether some languages are more error prone than others do you think there is a way to do it right I'm not saying if you did it you found there was no correlation that just says inconclusive yeah someone could say there's no there's definitely no relationship or there is some relationship so I think there are so many confounders that you have to like untangle all of these issues I I honestly I I don't know I mean I tend to think no but you know yeah I would also tend to think no just because of pre-existing biases like how one chooses a language to begin there's fun on the two yep Phil and there's one here yep so it's very clear that we it'd be very good if we could imply empirical techniques to resolve some of these questions that we jump up and down I jump them down and say functional languages are great but in terms of empirical evidence of the kind of scientific scientists accept is very little on the ground mmm-hmm so the fact that they did this study at all I think is very much to be applauded and I'm slightly worried bright any study of this kind it hadn't occurred to me to look at github data to get this out so you have to two points and there's another question here but I just want to say right anytime somebody does this of course it's always very easy to attack any study of this kind so I'm slightly worried that after you've done this that the next first of the things of doing it will put their head down let's say oh no I better not do it because a marine yawn will attack me here but this was careless I'm sorry I mean this the the some of this was was really sloppy I'm you know like it's a good question but you know there were mistakes that are really glaring when you actually look at the data I you know I I know I have sympathy for the authors but but it you know what it's wrong well what can we do right I can't upload that and what I do mind is that people keep citing this right just because it goes along with our common accepted wisdom and now it's it's a it's a thing you can say all these languages are bad because you know FSC 2014 and not I mind that because that's not supported by the data it's it may be true but it's not supported by the data don't put a site there say I believe that's fine I believe lots of things and I think we should be allowed to express beliefs there's a question here I just wanted to refine a question you were just asked you said that you don't know of a way to do it right but what if the effect were very large so suppose that the difference between C Botha some high school it's like you know ten times more bonds yeah would that be easy then and does that mean that if you've not found anything does that is that any sort of evidence against the effect being very large III think we just can't tell the data is just so confusing and so there's so many so much noise that there may be a large effect because okay so the majority of bugs are bugs of the kind hey the tcp/ip port is wrong you know unless you have a very powerful type system with dependent actually probably not going to catch that right that is not specific to the language so 36 percent of the bug are mislabeled amongst the remaining a lot of them are not relevant to the question you're really asking so how much signal is there just as a comment I think there is something political here clearly it's the pol community versus the SC community very much so and I think I'm not sure if everyone in the room knows that SC is winning very convincingly they move fast and break things and they don't care about what they publish and how true it is and as a community it grows a lot it's much easier to enter XE had 18 hundred attendees this year no refutation of any visible SC result has made the slightest difference in the careers of people involved in them I can name numerous examples including the first author of this of the study your debunking so you know they're doing fine right and we are just jealous yeah let's be cynical here your are the the most mediocre senior sets of software engineering researcher has more citations than you or me mm-hmm poor Phil for that matter I mean maybe you need to go a little higher in how senior they are but really they they they're a large community and they publish lots of stuff Yeah right hey they're winning yeah I will not say or more than I'm happy to be the grumpy old man in that story hanging in my backyard minging all right thank you very much

Info

Channel: Curry On!

Views: 4,118

Rating: 4.9327731 out of 5

Keywords: CURRY ON

Id: ePCpq0AMyVk

Channel Id: undefined

Length: 51min 59sec (3119 seconds)

Published: Thu Jul 18 2019