AI Assistant vs Property-based Tests

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hello it's Duncan our type driven temp in Bowling simulator is pretty much done but I still have a nagging doubt about whether the scoring is correct at the moment the test of scores are based on some example games that I dreamed up plus a single test that will strike score 300 and a property based test that if we never roll a strike or a Spare the scorers is a sum of the rolls in order to have full confidence we definitely any more tests but manually creating them and doing sums to work out what the score should be would be tedious and error prone computers are supposed to save us from TDM and errors and I've seen a lot of people using AI for test generation so today I'm going to ask AI system for help the results are interesting but not yet good enough so I went back to property based testing and found a way to combine jqu with AI Genera code that gives the confidence that I wanted for very little effort on my part before we go into testing scoring our property based testing last week revealed a bug it didn't actually fail a test but I happen to notice that we were rendering the final frame wrong if you scored a strike with the first ball so as well as a bit of a tidy up I wrote this test which is for rendering a final frame strike followed by misses and you can see here that we failed so we roled a 10 and then a zero we should have an X and then a gutter Ball but what we've actually rendered is an X and a slash now that slash shows a spare it means that the two rols have added up to 10 well the two rolls have added up to 10 but only because this first one was a strike and normal frames can't be a strike and a spare and we put the X in the second spot so this really is a pathological thing but we should fix it so let's go and find the code that renders that frame and here it is in rendering we've got frame two scorecard and you can see this is an extension function that looks at the type of every frame and decides how to render it here we're rendering roll to as a slash if it's a spare now we still want to render it as a slash if it's a spare but in this case we aren't a spare let's see what is spare is defined as and here it is it's actually a method in bonus in progress final frame uh and you can see here we say that we're spare if our total pin count value is 10 but we don't ask whether the First rle with a 10 so I think what we'll do is we'll just say that we are spare if roll one. value is not equal to 10 and the total pin count is 10 let's see how that does okay still failing but on the second one now so we fix the first one let's go back to rendering bonus completed final frame has the same issue is spare is defined here I think we should say the same thing here we should say that this is roll one value is not equal to 10 but the sum of the first two is 10 run that and we're good I must say that finding a bug like that begins to shake my confidence in a code base a little bit it wouldn't surprise me if we found another rendering issues like that but let's at least Commit This with fix final frame rendering where an initial strike doesn't mean a spare bit wooly words but at least it's polished off okay then now for today's proper task which is to torture test our scoring I pulled out the scoring specific tests here into bowling score tests and you can see we've got two of them the first one we wrote by hand because we knew the maximum score was 300 and so if we roll 121 then we expect the game will be completed and the scores will be 300 and this list of is around the fact that we're playing a game with only one player here the other scoring test we have is that a completed line with no strikes or spares scores the total of the pin counts and here we let jqu inject random small pin counts to us which we then adapt to give our final score and a small pin count is defined to be something between zero and four so there's no danger of a scoring a strike or a spare of course the problem is that strikes and Spares are the interesting thing about bowling scoring but I suppose it's not true that we don't have any test for those at all because we had these bowling tests that render a scorecard and in these we've rolled strikes this one is rendering spares but you can see once we've rendered the spare we do check the scores are as we expect but these are example tests and the problem with example tests is we only had the examples that we've dreed up and as we just saw with rendering a final frame strike sometimes we don't have enough imagination to find bugs that are actually there so back in the score tests one way we might generate new test cases is to ask AI assistant so I think what I'm going to do is I'm going to say let's take this thing as an example and let's ask AI assistant to generate some more cases for us so we've got AI assistant generate code okay so I find it helps to be polite we'll say please generate some more tests like this one for 10 Pin bowling scores okay so the generate command brings up a diff and I've got to say initial impressions are very good it's generated minimum score is zero with all gutters list of 20 zeros that should be list of score zero well that's very plausible a perfect spare game scores 150 so if we have 20 215s that again sounds very plausible let's have a look down see what else it did oh no just got rid of a blank line at the end so I think we can accept all of these and try running the tests oh well I must say it's very impressive especially as this match the style of the test we gave it as an example I think I could probably generated those two myself in not a lot less time than it took AI assistant to do but still very impressive let's try please generate more 10 pin polling test like this one but with a mix of strikes and spars and normal frames see how we get on okay that was over a minute so we obviously gave it something to chew on let's see what it's given us ah well that's interesting so we get mixed pins is a 10 followed by two fives a three and a four well it's a little little complicated I'm not sure it's right but let's accept it and try running oh let's see what we got shall we we're getting a cast class exception we we got a completed game which can't be downcast a playable game that suggests that it tried to roll too many things go and have a look at that test well okay I'm a little confused I must say it's not all clear to me how many pins we are creating here let's find out by printing mixed pins and we'll debug just this test oh yes goodness me I think we've actually started there with what is that 12 strikes followed by 12 fives well we know the 12 strikes is enough to end the game 1 2 3 4 5 6 7 8 9 10 11 12 yes the game will be over by the time we get to here that's no good at all plus even if it had done a good job it's decided the expected score is just the sum of the mixed pins not accounting for all the special rules now unless I'm missing something one of the problems with the AI assistants generate code is that that can't then talk to it we go to hear all chats uh 15 minutes ago and no these just seem to be empty so close but no cigar that time I think what I'm going to do is I'm going to delete this and we'll try the same thing as a chat so I'm going to take this code here AI actions are new chat using selection that will do me there we go and now I'm going to say the same thing please generate more tests like this one 4 10 Pin bowling with mixes of strikes and Spares and normal frames and I think I'm going to prompted a bit more to say make sure that the score is is calculated correctly let's see shall we this bottom one here is the same as we've got perfect spare game scores 150 it agrees about the score but it's copied my style slightly less well than previously again with no strikes or spares that's rolling just fours well that's not a bad test but it's effectively what we're doing down here with our property so I don't think we gained anything from that a normal frame followed by strike well that looks plausible I'm going to take it and put it in here again not quite the same style but good enough spare followed by three pins roll again don't know but we'll take it strike followed by two pins roll well I haven't got one of those let's see how fairs Splendid so we're making progress although these do still seem to be quite easy cases to me let's ask for a harder one uh what about alternating strikes and Spares well that's impressive if it works I'm not quite sure about this mutable list of pin count let's just ask it to fix that can you not use use a mutable list please oh I quite like that let's take it copy it here and put it in and run oh that's good it's of course hard to know whether these scores are actually right they agree with the scores that we've got but we're asking two things to happen we have to be right and AI assistant has to be right so we might be in danger of confirmation bias but let's push on and ask maybe finally are there any other nasty cases we should be trying okay let's go back up to the top of that and see what it says I can suggest a few scenarios might worth testing minimum score spare in the last frame strike in the last frame neither of those two I think have we explicitly tested here although we did have some of our bowling test of those continuous spares and continuous strikes I think we have covered that two rows in a frame adding up to more than 10 we do have a test for that but let's see what we've got this one we don't need to do let's take this one it looks useful a interesting drag doesn't work strike in the last frame and after two extra rolls continuous spares score correctly well we have that already so I'm going to copy these into here continuous strikes we've done that one that's 300 two rolls in a frame frame can't add up to more than 10 well I'm going to copy that and see what we get well I think we're using a junit for ISM there let's ask it to fix it how do I do the last one with junit 5 I'm not entirely sure about the style but it will do is copy it and put it into there important that and that and let's run I'm a little suspicious now I only appear to be running one test let's go make sure we're running all tests oh there's the confirmation bias it turns out that I was still running the one test that I was debugging earlier I think let's go and see what actually oh my goodness it seems that most things that AI assistant generated for us aren't working let's find out what they say okay now I feel like a bit of a fool we're into exactly sort of debugging that we don't normally have to do in test driven development where a lot of things are broken I suppose at least they are only tests that are broken I mean we haven't changed our production code although the test may be showing that the production code is broken I think I'm going to go ahead and debug if only to find out the sort of Errors we're getting so let's here will print L rolls run just that one ah and here we see the issue we rolled a 10 that's good we rolled a five but then we need to roll another five to make it a Spare what actually happened is we said we rolled 10 on top of a five but there are only five bits of would to fall down not 10 and now code is correctly rejected that so unfortunately let's get rid of that one it's annoying that it was so plausible run all the tests again strike followed by two pin rolls well it looks like in this case we completed the game too quickly so here we've got a strike that's one frame these two will make another frame so that's two8 frames left so I think this should be 16 is that true it turns out yes it is unless that code is wrong but I think at the moment I trust that more than I trust AI assistant two rows in a frame cannot add up to more than 10 well I think we were asking a lot for that to succeed given that we haven't generated this error message I'm not sure there's anything in this one for me I'm going to delete it strike in the last frame allows for two extra rolls now we managed to complete this game but the expected score was 110 and we said 160 I think I may need to appeal to Authority rather than maths here so here's what bowling genius says and who wouldn't trust a genius it says the score from a run of spares followed by strike followed by a spare is 160 which agrees with our scoring rather than AI assistance so I think I'm going to say that test is perhaps useful as an edge case but we have to correct the score okay spare in the last frame we actually thinking about it that is only the same as a run of spares which we' already scored somewhere hadn't we ah yes there you are 150 so it can't be both 105 and 150 but I suppose it's all the right digits just not necessarily in the right order so I don't think we're adding anything by keeping that test so I'm going to delete it last one maybe all node two normal frame followed by strike we again disagree on the score and again bowling genius thinks that our code is right not AI assistant not a bad test maybe we'll make it pass that way and go on to this one here we completed too early I suspect that should be 15 instead of 16 let's find out no oh I'm going the wrong way I'm going to say 17 but we do at least agree on the score let's go back to making sure we're running all the tests and eight test sounds more aesome I'm not sure how I feel about these tests now we've got them they largely feel a bunch more arbitrary examples or things that maybe I would have generated myself but I'm going to check them in for the record so this is ADD AI assistant generated scoring tests fixed by a human so if we're unsatisfied by just yet more examples can we do better with properties now there are some simple properties of scoring that we might test for example that the score in a frame is never less than the score of the previous frame the score of a striker spare frame is always higher than 10 I can think of quite a few properties but they all feel either too simple or too complicated to test but if we had an independent Arbiter of what the score should be for a game then I guess we could just get jqu to generate Val turns and see whether our codee and the independent Arbiter agrees effectively what I was doing with bowling genius but a lot less by hand so we could either do a clean room implementation of bowling scoring or rely on the fact that there are enough examples of the bowling scorec cter online for AI assistant to do a good job of it the second one sounds easier so let's ask write code to score a bowling game given a list of pin counts well then that's um broken everything by putting it above the Imports and my goodness seems to be more complicated than I'd like I'm going to ditch that I think and we'll go back to the chat let's try a simple function to calculate a 10 Pin bowling score from a list of in representing pin counts let's have a look well I'm not sure what I'd write but here I'm going to take the wisdom of the crowds and reckon that the more mutable it is the more it's likely to be right because more people will have written it like this so I'm going to take this oh can't drag still oh well drag did work it just didn't scroll there we go copy that and I'm going to put it into here and see whether it compiles it does first thing I'm going to do is remind myself that this was AI assistant unmodified and now let's see how we can use that here in our properties tests we've got a very simple valid game so let's take that and move it into here and we're going to need these providers to be in scope as well so valid turns and valid final turn let's go to the properties test and find those so valid turn so we need to copy those into here that needs to be public and see whether everything compiled and runs it does good just check because I'm getting paranoid now that we are running there are no errors in a valid game there good and now let's take these two as a variable which is all turns and we'll look at the type of that so that's a list of turns and I think we should be able to turn that into all rolls is all turns flat map it which is a turn rolls and if we're right the type of that is list of pin count which is good because we can now say Val expected score is calculate bowling scores from all roles do map it. value and now let's just pull out something like this as the test down here and that should be a score of expected score and now the question is do you feel lucky punk well that's encouraging let's give ourselves a bit more confidence by printland end game to scorecard and I think on the end We'll add on expected score now we don't seem to be able to run just this one property interestingly so I'll run everything which still passes and look at well the thing we called there are no errors in a valid game and that's very plausibly zero I can believe that's 117 it at least looks like we're comparing the right thing with the right thing and generating some pretty random games Splendid so I think we will rename this let's try not to claim too much but we say the property is that the final score is the same as another calculation little bit of a tidy I think we might say we don't need to know that but if we didn't say that we could add the map in here that would allow us to say that I think we might inline that move that after we've done everything and get rid of that and once again go back and make sure that we are running all of our tests 32 of them brilliant so that leads us to the question of what examples we want to keep let's go on up and have a look this property I think is a good one it will be subsumed by this one if we throw enough examples but it's a good back stop this case is not really about scoring I think but I might leave but this one here leaves me a bit cold ditto the spare normal frame followed by strike well again we're just rolling zeros that's not very interesting perfect spare game scores 150 I think that's good minimum score is zero I think that's good maximum score is 300 that's good might just reorder those for no good reason and there we have it we'll commit with ADD property based tests for scoring and remove some I'm going to say arbitrary examples ah I think this should be private and run one more time with feeling and commit you know I've just been sat here looking at this calculating bowling score and I realized that I really have no idea how it works I believe it's been interesting thing with the indexes which feels as if it will work in the final frame if there are three rolls and it agrees with the scores that we're getting but if this was fly by y software or pacemaker I think I might want to understand it better any who I think on that note we're done for the day I think we've done 10 Pin bowling to death although I do feel maybe a review of how well this mix of type and test driven development worked out for us might be useful maybe just a bonus one if you'd like to see that then please subscribe to the channel click the like button so that YouTube keeps these episodes in your feed and you might like to buy the book that I wrote when that price called jav to cot a refactoring guide book details of which are on the chanels below thanks for watching
Info
Channel: Refactoring to Kotlin
Views: 451
Rating: undefined out of 5
Keywords:
Id: I5vjCHIq5xs
Channel Id: undefined
Length: 24min 8sec (1448 seconds)
Published: Fri Mar 22 2024
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.