Refactoring to Java 8

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
I will be talking about respecting to Java 8 this has been a surprisingly popular topic over the last year or so because we've seen presentations about how about how Java 8 has introduced new features like lambdas and streams and so forth and we've seen presentations about how to use them in new code but what's kind of more applicable to most of us as day-to-day developers is how do we take an existing code base and perhaps move it in the direction of Java 8 and perhaps also understanding some of the pros and cons of doing this and some of the trade-offs so that's what this talk is is aimed to answer but first before we do that let's take a quick look at why should we move to Java 8 well Java is now nearly three years old so I'm guessing probably a lot of people already using Java 8 or considering it and probably one of the most compelling reasons is the fact that Java 6 and 7 are end-of-life and no longer officially supported so it's a good idea to move to Java 8 just for those reasons but there are some really compelling reasons to argue when you're trying to push your case to management or to the support team or to ever inside your organization cares about these things about why you should be adopting some of the job 8 features firstly Java 8 is generally faster than previous versions of Java you don't even have to do anything or use any of the new features to get some of these speed improvements for example some of the common data structures are faster there's some speed improvements around concurrency and so forth and for more information about this click on that link because there's some links to additional documentation about that of course it's easy to paralyse that's the point about some of the stream operations the fact that you can run stream operations in parallel without having to manage things like thread pools or executive services or so forth it's just it can be done quite easily for you you write fewer lines of code as a hardcore long term Java person maybe this wasn't the most compelling reason for me but in actual fact once you get used to using the Java 8 style you start to see that what you're writing is what you want to achieve not how to do it and that's quite nice but takes away a lot of the boilerplate and stuff that your eyes were kind of ignoring and lets you focus on what's really important in the code and they've been to do things like pass around lambda expressions or use the streams API gives you new solutions to problems and perhaps lets you solve things you couldn't have solved before one of the areas I think is under and publicized is that some of the Java 8 features let you minimize errors we'll see some examples of where we can minimize things like copy and paste errors using lambda expressions but there are other things like optional and other features that let you minimize the amount of errors that accidentally creep into your code and also things like the streams API and the added readability reduction of boilerplate can let you see much more easily where you've made a silly mistake so let's go straight on into the refactoring side of things but before we do any sort of refactoring we really do need to have a safety check to make sure that we're in the right place to do some refactoring firstly it's really important to have automated acceptance tests if you're going to do any refactoring because although a lot of the research things I'm going to show you the the idea is the functionality will remain the same you can't be sure that you haven't accidentally changed some functionality if you don't have automated tests that prove to you your code still does the same thing it used to do performance tests are important for the same sort of thing and in fact we'll talk quite a lot about the performance of some of these refactorings as we go through this talk so you'll get an idea for perhaps how using Java 8 idioms impacts the performance of your code if performance is important to you it's important to have tests to prove that you haven't done something weird to impact the performance and when I say involves performance is important I don't necessarily just mean sort of nanosecond latency for high-frequency trading or whatever I sometimes mean things like responsiveness to the user in less than half a second or something like how long does it take an Android app to make a call to the backend so these aren't necessarily I'm not talking about high performance I'm talking about making sure that you have performance tests for the things that really matter to you performance wise before you do any sort of refactoring you really need to decide as a team as an organization why you're going to do this refactoring and there are a number of different reasons we've talked about things like performance and readability removing boilerplate and removing errors you can also do respect for learning you might want to upskill the team on a code base on a domain that's familiar to them so get them used to using Java idioms in an environment as comfortable to them rather than trying to get them up skilled on a new language with new idioms on a new domain on a new project so the idea of doing refactoring in order to obscure team is also a valid thing to choose it doesn't really matter which of these goals you choose but the idea is is to have in your head what you're trying to achieve by doing this you're factoring and stick to that you should limit the scope of any refactoring you're doing in these examples were largely going to look at the method level so they're going to be small and small bits of refactoring that we're doing that has fairly limited scope and you might want to restrict things either to a method level or class level and we certainly shouldn't go straight in and try to refactor the whole code base to use optional for example you really want to bring friends the area that you're doing these refactorings in the code base were going to be looking at is a project called morphia this is an object document mapper for MongoDB and there's a few reasons why I've chosen this firstly because I was working on it when I was working at MongoDB so it's a code base I know very well but more interestingly it's an open source code base so that you can take a look at it for yourself see their before and after the refactorings it's also a fairly mature piece of code so it has a lot of the same characteristics you'll probably find inside a lot of your own code bases sort of bits of code that maybe have evolved in a sub optimal way over time or bits of code where it's no longer sort of fashionable to follow those ways of doing stuff so it's quite a nice code base for refactoring is you can look at some of these and use chunks of code and think of better ways to do these things especially using Java 8 so without further ado let's dive straight in and start doing some refactoring the first thing I want to look at is refactoring to using lambda expressions because this is probably the simplest and most automated refactoring that we can do the easiest things that we can do the easiest things can do is take the take any any anonymous inner classes that implemented things like comparator runnable callable predicate all of these interfaces that always only have a single abstract method and use give lambda expressions instead of using anonymous inner type so let's go straight into our code what I've done is I've run inspections on the morphia codebase I'll take a quick look at this because I'm in a fortunate situation where I can actually show some of the IntelliJ specific features in this webinar so I've got a I've got inspections profile called Java eight and I've selected a subset of the inspections most of the ones that I'm interested in are in the let's find it so I'm in Java most of the ones I care about are in the where is it I've gone right past it language level migration AIDS so here we have things like we are selecting to find anonymous inter types that can be replaced with lambda expressions there's a whole bunch of stuff around how we can replace lips to the streams API and a lot of other features here so this is the sort of main set of inspections we're going to be running I'm running a few more and we'll see those as we come along but this is where you want to configure them so in inspections in a language level migration AIDS you might want to turn some of those on now I've already run those let's have a quick look at the first set this is anonymous type can be replaced with lambda and if we look at this example we can see this is a predicate a guava predicate and we have our anonymous it in a type with our own method and IntelliJ can just automatically turn that straight into a lambda expression so that's a fairly straightforward piece of magic whereby we are going to remove all the boilerplate of our class declaration our method our type information and just keep just a business logic just the stuff that we care about and that into a lambda expression let me show you another couple of examples I think with the maturity that Java eight is that probably most people in this webinar have already come across these things so I'm going to skim quite quickly over some the introductory level stuff so here's another example this is an example where we have a runnable and again the runner ball can be converted into a lambda expression and here we just get the we just keep the core of what the run method is what's a little bit more of an interesting example is this callable here has two lines so when we shrink this down to a lambda expression you can see the two lines in the lambda in the lambda expression and what I prefer to do personally is I like to extract these into a method and give it a description so this is until ten items found and as this gives me a couple of benefits one I can shrink my lambda expression right down and I can also even maybe turn that into the method reference if the if the call is is the correct shape but mainly this is much more descriptive so now I can say until ten items found so it's much clearer to me what these two lines of code is doing on top of that if I have any errors I want to debug this it's a bit easier to debug lambda expressions if they're actually inside a method so that's that's my main reason for making my lambda expressions a single line lock so that's all fairly straightforward what's kind of more interesting is we can those examples are anonymous inner classes that were implementing an interface we can do the same thing with abstract classes but we have to go one step further so runnable let's take a look at runnable oh that's not the right shortcut so run a ball functional interfaces in the interface with a single abstract method and if I look in my code I can find abstract methods that look fairly similar so I have an abstract method with a single abstract a with an abstract class with a single abstract method on it and IntelliJ suggesting that you can convert this to an interface which I'm going to do and then once I've converted that to an interface I can tag it with functional interface and then what I can do is I can go around and look at the implementers of this and now I can turn these into lambda expressions now I can do that before because lambda expressions don't what would abstract classes only with interfaces so that's just an additional step that we can do so those two things are just taking your existing code and applying basically new syntax to the existing functionality that you had so let's have a look at that performance here we've got one of those examples so anonymous in a class and our lambda expression and these the two things we're going to compare in our performance test so in our performance tests were measuring operations for millisecond so more operations is better so I've got my smiley face at the top to remind me which is the better performance so here we can see the lambda expression outperforms in an anonymous inner class not by a lot it's kind of fairly equivalent but the lambda expression is better performance now obviously Oracle is a lot of performance testing around lambda expressions and there are lots of different types of anonymous inner classes and lambda expressions so the performance of them varies according to what you're doing so if you are interested in the ins and outs of this you can follow this link here and that will take you to a much more in-depth document about performance testing new things in this particular example you can see sometimes the lambda expression which is the one in this little reddish color is better than the anonymous in attacks and sometimes it's not better so the answer is generally the performance is fairly similar but it's not exactly the same in terms of the performance analysis our lambda expressions faster than or slower than anonymous inner types well they're more or less the same which means that you can go ahead and refactor this automatically without worrying too much about it and but if it really matters then obviously you want to measure each individual instance to make sure that it's not adding some performance overhead now that was a straight replacement of anonymous in the types to lambda expressions what's more interesting is some of the patterns we can follow now that we have the ability to pass around behavior and not just object so we're going to find some places in our code where we can start passing lambda expressions what I've done is I've created a structural search and let's take a look at the structural search so I've created a search which is going to look for a call on log of any level with any message and what I want to do is I want to replace this with something which takes a lambda expression and I'll show you why in a minute if we look at one example of these we're doing a log debug now normally in most of your code bases you'll seem to something like this if look is debug enabled then do that and the reason does the reason for that because obviously the the the check that it could be inside the debug method and in fact probably is as well but the reason to wrap it in the if statement here is that here you're actually building up a string you're creating a new string in this case this Stages field is a list so it goes off to that list and builds up a string which curt which is a representation of all the items in that list and then append it to the stages thing I'm n calls to debug method with that string now if you're in production you are probably not looking at debug level so you've just incurred the cost of building up that string when you don't need it because you're not actually logging at that level now we're going to talk I'll turn we're doing this so if instead we pass in a lambda expression what we're doing here is we're passing in the recipe of how to create that string but we're not creating that string just yet we're going to do it lazily so let me remove this here by the way if you want to know what what shortcuts I'm using they are flashing up down the bottom of the screen you can see which shortcuts I'm using to to try and navigate quickly so here I'm going to create this method it's going to take a supplier of string let's call it string supplier this logger interface is implemented by a number of different implementations now what I don't want to do is I don't want to go into every single one of those implementations and create the same implementation of this debug method because I actually know what this debug message should do so let's create another thing which is new in Java eight a default to method on my interface and this whoops I've accidentally got with that and then this default method is I'm going to do the check for whether the debug is enabled there and then I'm going to do debug string supplier yet now the difference here is that it's only at this point that we build up that string so it's only at this point we incur the cost of creating that string and what's nice about putting this as a default method is that this is now inherited by all of the implementations of this interface I don't have to put that implementation on all the individual instance implementations so that's good for performance reasons we'll have a look at the performance so you'll see that in a minute but what it's also good for is reducing things like copy and paste errors let me show you an example of this so here's an example of a debug statement that's in the code and it says log dot is Trace enabled but it actually logs with info level this is not uncommon in terms of copy-paste errors the fact that we're going to copy this from somewhere else and then maybe change the level of the look and now that we can actually pass this in as now we're doing the if statement inside the info Court for example then I don't need this and I know that I'm doing the right thing inside the method so I'm reducing these copy/paste errors by passing these lambda expressions around so not only do we have an impact on performance but we also are potentially reducing some of the errors too so let's look at the performance of this here I've got a performance test which is testing for different cases one where I'm I'm logging with a constant string which is always the same thing - when I'm logging with a lambda expression which also has a constant string before the third one we're logging with a string which is changing I'm going to be appending some new value to the end of that and you can see it's just a really simple change and the third one the fourth one where I'm passing a lambda expression with the same thing but this one will be lazily evaluated inside the look so that if we are not logging that string we don't incur the cost of creating it and we can see here that the performance here is that it is very significant when we actually have to create the string that we don't need we get very poor performance from that particular test so that by introducing relating evaluation using lambda expressions we have got quite a quick win for performance in our code so this is a good thing to do we should definitely try and think about where we can use lambda expressions for lazy evaluation that's kind of all I really wanted to talk about explicitly around lambda expressions I've got way more to talk about around collections and streams because there's quite a lot of opportunities for refactoring to using the new methods on collections and the streams API the file and simplest thing to do is basically find everywhere where you are using a for loop before and replace it with a for each method now collections themselves have a for each on and so you don't necessarily have to use the streams API so sometimes it's dot stream dot for each and sometimes it will be simply a dot for each on your collection and we'll see different examples of these let's go back to our code here what we've got is we're looking for loops can be collapsed with the stream API if we look at our first example here it says this can be collapsed with a for each so if I do this enter replace it with a for each then I've quite simply removed my for loop entirely and just replaced it with a for each with the name of the method I want to call for every entity in that set now I can go one step further now because I realize that this entity's variable is only used here so I'm going to inline this and then once again this our variable is going to used here so I'm going to inline that as well so I've actually just removed a lot of boilerplate there and all I'm really doing is infer this for loop I use a for each and I'm going to call this this map method here so it's quite quite simple example let's take a look at a slightly more complicated example here we have a for loop followed by a for loop followed by an if statement and inside this if statement we're going to add things to another collection so let's see what IntelliJ suggests we can replace with this with a foot each now let's dissect this a little bit our faller has been replaced with a foot each if has found its way into the filter we are calling this on a stream and we're actually using map to create this new constraint violation thing which is the thing that needs to get added into our set so if we've collapsed is for and if into a filter map and for each call now I'd be tempted to simplify this a little bit more just for readability I might call as create violations but that's totally up to you really so we've managed to simplify that code a little bit but we still have this outer loop here so for me I would look at this and think well maybe there's something around the design which isn't great because what we're doing here is we're passing in another collection and then doing some iterations multiple iterations because we're iterating here this is effectively an iteration and then we're adding stuff into another collection so I would look at this and I'll be thinking well perhaps nice refactoring needs to take a step up take a step back from this and look at the methods which are calling this method to see if things can be restructured to be able to do this in a single stream operation let's look at another example here we have an if statement our for loop another if statement for loop and then an if and an else this is quite a complex piece of logic now because we have an if and an else we have two code paths set so we know that this can't be collapsed down into a single stream operation inside here but we'll see what we can do here now IntelliJ has taken a look at decided it can do a slightly more complex thing so because we've got two four loops we've effectively ended up with a four each and a flat map and the flat map is is as a result of that second that second fall is there and the if statement has been reflected towards the filter so we actually have the same functionality the same features just in using the streams API so what's interesting here is to performance test the difference between me the classical for loop and the new for each whether you're using streams or not so this is our first example our simple example where we replaced this for with a simple call to forage and here we find the performance is pretty much exactly the same so we don't really need to worry too much about whether we decide to do this or not there's more complex example which uses the flat map as well as a filter again the performance is kind of more less the same so we can make decisions as to whether we want to do this use fracturing based on performance we could say it doesn't really matter we can apply this refactoring if you want to but we still have to look at this and decide whether we think this is more readable because we do still have another if statement outside here and we do have this quite ugly if-else inside here so to me there's room in terms of readability there's room for additional refactoring there we can expect the IDE should just magically do this for us and a third example where we remove this full openness if statement into a filter and map the performance here is quite a bit worse on the reflected code but it's not happy it does about half the amount of operations per millisecond it's not a factor of ten but it's a it's not as not ideal there it's not exactly the same performance and this might seem quite puzzling because it's still doing the same thing as some of the other examples in fact arguably it's doing slightly less because there's no flat map and it's not a such complex example however if you look at the performance of these this one for example we're getting nought point six operations per millisecond which is not very many operations so it could be that the thing was happening for the loop is quite expensive and iterating over the loop is not the most expensive thing in this code therefore whether you use a for loop or a for each doesn't really impact your overall performance whereas here you're getting nearly a thousand operations per second in which case it might be that using the streams API has added an additional overhead to this to this iterating over these in particular so is it is it a given that you're going to refactor any for loop into using a for each well no sometimes it's going to be foreign performance wise and sometimes you're going to have to think about whether it gives you anything in terms of readability or performance the next thing that's quite straightforward to refactor is turning a for loop into a collect payment so if we have a look at this example now here we have a slight complication in that our what we're doing is we're iterating over a list of keys and we're adding the IDS into a list of ID's so this sounds like a classic problem for collect however this ArrayList is being initialized with an initial capacity which can't which means that encourage a can't automatically turn it into a collect statement so I'm going to assume that this works in red to test to see whether this has any performance impact so to begin with I'm going to remove that initial capacity so that I can refactor this more easily now I can replace that with a collect statement and I end up with a kind of classic collect where I'm going to map from the key just get the ID and return a list of ID's from this list of keys let's look at another example so in this example we're doing something fairly similar we are populating a list of valid fields with some subset of fields from this fields list and we can actually see what we're trying to do we're going to ignore static and final fields so let's get IntelliJ to replace the places with a collect statement now what I would like to do is I'd like to get rid of this comment because the comment says the codes not really very clear so in fact what we can do is we can improve this by extracting that inter methods let's call this is not static or final and then we can get rid of this comment completely I would argue that in this case this is more readable so we're filtering out the only things that we want a fields that are not static or final and we're collecting news into a new list and returning that list so it's a bit more readable than it was before we've also removed from the boilerplate and obviously there's comments so let's look at the to this the first example what I wanted to do as I mentioned is I want to performance test whether this initial array allocation has any impact on performance so I'm testing it with the array allocation and without it and then I'm testing the refactored code with the map and play and what we find in this particular case for a list of ten elements which isn't very many that the original code with the pre allocation of the of the ArrayList size is much more performant than any of the other examples it performs better than the simplified code without the array size it performs better than the refactored code and parallel with ten items basically let's not even talk about it doesn't even appear on the chart if we add a few more elements into the list that we're iterating over so in this case 10,000 elements then we see that the initial allocation of the list size really starts to make very little difference and parallel starts to become much more performance but our refactor code is still not performing quite as well as the other examples again when I put my orders of magnitude but it is it is a it does perform a little bit poorer the other example we looked at here weren't being able to get rid of this comment by replacing it with a filter intellect now here the refactor code performs worse than the original code as well in this case I think it's because you're doing in a raised dot stream so you're using you had code that was an array and and now you're having to do arrays upstream so it's probably not going to be as an efficient we'll talk a little bit more about this a little bit later on so automatically turning your code from a for loop into a collect statement and you're going to have to be careful a little bit about the performance of that code so if you if you if you really care about performance you probably gonna want to write some tests now here's a good time to talk a little bit or have a little bit the caveat in the grand scheme of things some of these performance implications may be pretty much nothing like I said we're talking about maybe half the speed but we talk about micro benchmarks I'm benchmarking just this method in in this particular case this code actually is about code which goes off to the database so the cost of talking to a database is probably substantially more than the cost of adding and then some of the additional costs added by adding by using the java 8 features so micro benchmarks are useful but you have to consider stuff in this in the grand scheme of the whole application which is why it's important for you to have performance tests automated performance tests for the code package that matters the most to you now here's an area where I think streams are very interesting where we can look at code which was previously a series of operations and collapse them down into a single stream operation here's our first example we have a set subset here of constraint violations whatever they are we're going to iterate over those add them into a new list and then we've sort that list and then we iterate over that list and do something with it so we should be able to collapse this down to a single stream operation so IntelliJ says that we can replace this with a collect statement note that it has figured out that because you have a sort call here it can add sorted here and this is a fairly new piece of intelligence in IntelliJ some of the older versions don't make that leap so it's worth bearing in mind that if you have a sort cord you want to make sure that your stream operation is also sorted and then what do we do so then we take this list and then we iterate over it and do something so that seems to me like perhaps we can replaces straightforwardly with a for each we'll do the same thing that the for loop was doing so we do log line dot log and of course we don't need to allocate this to anything because it's not doing anything anymore and then we simply remove risk so with we originally had an iteration here some sort of iterating happening in the sort probably and another iteration through the the list here and we've collapses down to a single stream operation so it's a bit neater and a bit more understandable and hopefully we should see some sort of impact on a performance the second example here is in query in pull now this might not look like multiple operations so what you're doing is you're iterating over this collection you're assigning stuff adding them to another field list and then you're doing fields to array and passing that into another method so you're doing at least two operations there one you're iterating over a list in creating a new list and to you're then turning that list into an array we can actually see what we can do to improve this again we have an initial capacity on this list so it's not straightforward to turn it into a collect straightway I'm going to remove that initial capacity and then I'm going to replace it it says replaced with two array now this is some magic that I think and this this particular refactoring did improve in IntelliJ twenty sixteen point three and I think it's got even smarter in 2017 point one because when I demoed this last time I still had to do some manual steps and now I don't have to do any manual steps and what it's done is it's taken all of my original code and pretty much completely rewritten it now I removed this this size allocation to be ArrayList it's using the get name to store just putting that into a map and then it works out that what I've done here is I've added things into a list and then I call to array so intelligence collapsed that straight down into a to array call and in lined that into the retrieved fields method called as well so there's quite a lot of stuff going on and now you might want to do with some of these operations you might want to do them in incremental steps to see what's actually happening example where we had sort of three operations collapsed into one the performance here not surprisingly the refactored code performs better than the original code not by millions of orders of magnitude but it does perform better this is good it's more readable and margining more performance now this second example the like I said there's a sub naive intermediate steps for refactoring this where you might say I'm going to collect everything into a list and then called to array on it so I wanted to performance test that to see what the impact was here because this is basically two operations we're going to collect and then a to array which is the same as the original code and then I want to compare that against the new complete refactoring where we go straight to two array and here we can see that the naive refactoring with the two steps doesn't perform as well as the other operations and the the fully refactored one performs almost as well as the original and this is a quite a high number of operations per second so this is quite a good refactoring in escape I've had to say the performance analysis is inconclusive even though I'm much happier with these results than the other ones because again we're not talking about orders of magnitude better but the the findings here is that we don't have a negative impact on performance by doing these particular refactoring now I've just got two more research things to show and then we're going to sort of take apart some of the some of the things that we found and come to some conclusions this for loop to any match is a new refactoring which came in in twenty sixteen point three I'll show you an example I'd lost it if you so any match is a fairly straightforward operation it's where you are going to be limping through some lists and as soon as you find something which matches some criteria then you return a true or false so this can be automatically turned into and any match operation here you can see again we just got rid of the the iteration code and kept the the main core code that's in the if statement so here I'm going to performance tests the difference between the original code the the new code which is it's more readable but of course the interesting point about a lot of these Java eight stream operations is the ability to paralyze them so I'm going to performance test this against the parallel version as well now the original code performs pretty well and the reflected Kurtis is for ten values in the list the original code performs very well the refracted code is not even half the speed and parallel for ten values is nowhere to be seen because the cost is splitting up a ten value list running over multiple cores and then bringing the result back in is just far too much overhead for that number of values where it starts to get a little bit more interesting is where you have ten thousand values in this particular list but again we don't really see any great performance improvements here and I think in this particular case I think it's because we're using a raised dot stream and iteration over arrays is a much more it is a nice efficient thing to do for compilers and for the CPU it's a very predictable thing to do to do anything with a straight array class and so when you incur the cost of turning that into a stream start doing with objects you're going to end up with quite a lot of additional cost and you can see that in the performance here the performance really suffers from using streams instead of using the straight array and when we get to a hundred thousand values in here then with then parallel starts that outperform we factored by quite a lot more both game are still not really seeing the performance of using raw array manipulation so any match I said but it's not the fault of any match it's actually the problem of using a raised dot stream doing any sort of operation on a raised dot stream is probably going to be much more expensive than doing operations directly either way particularly for smaller ways and find first so this is a very similar thing to any match if I find my example this is again where we're going to iterate over some lists and once we find the first thing which matches some criteria then we're going to return that thing so we can replace this with a fine first and in this case that's in line in line that we are doing in a raised dot stream we could also slightly simplify this by saying a stream of it does the same thing it just it kind of reduces the amount of code on the screen so that's why I prefer that particular example and so here yes again we've just removed everything from the if statement and put it into the filter another example of this is in my converters I'm going to do the same thing here we are doing on the previous example we had to create our own stream of effective classes in this example I'll often in this example we're doing a stream we're getting the stream from this collection which is a list and here we can see we are explicitly stating what to return if we don't find anything so there's some readability gains there as well so let's do a quick check of the performance of this fine first is again not performing quite as fast as the original in particular here because we're doing a raised stream and I wanted to test the difference between doing a raised dot stream and stream that off and the performance here is exactly the same so pick whichever one suits you I'm in the second example where we're using the streams API we again we just don't get the same sort of performance that we've got from the original code for an array of ten element as we start to get much bigger data that's where we start to see the benefits of both going parallel and perhaps using the refactor code so again find first in terms of performance it's worth testing that to see if you've got any field impact on your performance for that particularly for small arrays my final example is remove if this is one of my favorite examples and I hope you'll be able to see one here what I've got is I've got some set helpfully called s which I'm iterating over and then I'm pulling it right out of using a while loop which happening for a while to remove everything from the iterator which matches some criteria and I can replace it straightforwardly with a simple remove if we're passing the parameter of the thing I really care about and here we just get rid of all of this nonsense of how to traverse the how to reverse the collection of how to get the next item we just don't care about that we just pass in the lambda expression of remove everything which matches this particular pattern and if we also have a look so here at you can see I've just got rid of all that boilerplate if you look at the performance it actually performs better as well so out of all the refactorings here this is my favorite one you get you get better performance and you get more readability as well I've had to say inconclusive because the performance is not so order of magnitude better that is just marginally better in this particular case let's go through and summarize each - because I kind of fund a lot of different refactorings at you and a lot of performance data so I really want to summarize those things before before giving some sort of conclusion about refactoring to Java eight and then after that there will be time for questions so refactoring to use lambda expressions instead of for example anonymous inner pipe is very easy to automate and it's pretty safe performance wise you're going to get more or less the same sort of performance so you can kind of under under most circumstances just go ahead and do that without worrying too much designing for lambda expressions so for example using lambda expressions for lazy evaluation could give you a big performance benefit in the example of the logging example if you don't need to incur the cost of building up the string because you're not going to log it then wrap it inside a lambda expression and you're going to get much better performance from that generally using the new idioms the new methods on collections the streams API lambda expressions and so forth does increase readability I've got a few examples here for example obviously the obvious one is lambda expressions versus anonymous inner classes just removes all the type information that you don't really need I'm using a collect instead of using a for loop and adding stuff is a little bit more descriptive the any match is again a bit more descriptive a bit less code but it's not so much about the a bit less code it's more about what is this thing doing well it's just telling you if anything matches this particular criteria and the same thing with fine first fine first just allows you to state this is the thing we're looking for something with this annotation and of course remove if you just lose a whole lot of boilerplate and you're left with only the thing you care about only the pattern that you want to remove and then when you doing multiple operations and tracking those down into a single extreme operation you get quite a lot of wins so instead of having to have two four loops with the sort you have a single stream operation which does a map sorted and a for each and so it's much more easy to reason around and probably a better performance and again here this is where we have we were doing multiple operations without really realizing it because we were collecting stuck into a list and then doing a - array on it and here we'll just turn that into a single operation but you do need to be aware of the performance now I'm not saying you have to micro benchmark every single instance of these of these refactorings but you do need to have an understanding of which of them might have which of them might have an impact on performance and which ones you might need to go away and check and test for for example anywhere we're going to be using a raised extreme is probably going to be slower than using in a way as I said operations on arrays are very efficient for computers they're easy to reason around for the compiler is they're easy for the CPU to reason around and in memory all those array objects are next to each other - so array operations traversing arrays is a very efficient operation so arrays dot stream may be taking away from efficiency and so this is one of those example where the using arrays just gave you much worse performance from the original code using the for each or the collect might be slower than iterating over collection this is where the answer is it depends so there are some cases where using the for each gave you sort of half of the speed that you had had before in some cases in sorry some cases in the past where you get the same sort of performance this is where it really it depends a bit on what you're doing what's the cost of the operation inside that iteration and parallel is not going to magically give you speed improvements parallel is not some magic incantation which would just do all the hard work for you it will give you speed improvements if your data is very big for example here we've got 10,000 elements we're starting to see where parallelize is giving us some benefit and if your operation is very expensive then you might want to use multiple CPUs and format over multiple CPUs those the cases where parallel is going to be useful for you sometimes with these Java 8 features you get improved readability and better performance so for example it was the remove if we got a better readability and we got better performance as well so let's conclude these should you migrate your code to Java 8 well of course it depends there's never a straightforward answer to to any technical question so it depends you have to remember what the goal of your refactoring was was your goal to get better performance in which case obviously you're going to need to test that performance if your goal was readability or learning then you need to check whether the refactoring has given you better readability or whether you have actually learned anything from doing use refactorings so you should be comparing the results of your research rings with the initial original goal that you had in mind you need to understand which things might impact performance like using a raised dot stream and if you're not really sure which things are going to impact performance you need to measure that your tools like for example IntelliJ IDEA can really help you automate a lot of this process and can suggest things to you that you didn't know was possible but you do need to be able to look at what it's doing figure out if practice a performance impact you need to look at it and see if actually this does help you in terms of readability or it doesn't and some of those examples that we showed they the refactoring was really indicating that perhaps is a bigger design issue to address before we do these low-level refactoring so all of the reference material is available at this link here there's a lot more information about Java 8 there's a link to there'll be a link to this video on there there's links to slides there's links to the code is basically all the resources for this talk are available here and now is a good time to stop and take any questions that haven't been answered yet ok thank you for presentation and not to include questions we have I see a question why would an anonymous inner class be any different performance wise to Rwanda are they not equal ones compelled and right well the short answer is no they're not it does look like lambda expressions are syntactic sugar over anonymous inner types but they're not they're actually different under the covers and look for Bryan gets into presentations because he does like whole hour on why they're not the same thing Oracle works really hard to make sure that lambda expressions are not worse performers than anonymous inner types but under the covers they are not the same thing because basically a lambda expression is not a full class an anonymous inner type is a full class with like headers and memory allocation and it you know it's a full-blown object whereas lambda expressions aren't okay the next question wait wait where do we need to change our code why isn't it handled by come compiler if it could be handled by JIT then we wouldn't need any in factory if it could be refactored by the ID there it could be handled by JIT that's a really interesting question and I think probably the short version is that when IntelliJ suggests our refactoring to you it is not saying that the new code is exactly the same as the old code under the covers in Telugu suggesting and new syntax to use or different alternative syntax to use that will achieve the same aim but under the covers it's not compiled down to the same bytecode in particular some of those refactorings and the the two array lon let's see if I can find it and some slides better hold them yes this one and discouraged here on the right it does the same thing as this code on the left but it's not same code but it's not there's one step missing we're going straight from a map tour to a rate so the compiled code is not the same thing and and that's because IntelliJ gets to suggest to you the developer this is something I think that does the same thing in terms of behavior provides the same behavior as the original code but I need you the developer to decide whether or not this is something that you you really want to apply and because as you saw with some of these the performance shows that these are not the same operations under the covers so you need to be able to you need to be the one to make those decisions and the kit probably can make decisions for some not from the examples I gave but for some things but the tip is not going to go away and say oh well I think that this should be in a raised upstream operation because it's clearly not more efficient some of these refactoring is I'm suggesting are to aid as the developer to read things and to give greater readability to the code not to aid the the JVM to run faster okay one next question should we perform added references instead of lambda syntax oh I like that question and when I first started in Java 8 I really disliked method references I thought this weird double colon thing was kind of unreadable and difficult to get my head round and but I have generally over time come to prefer method references over lambda expressions where IntelliJ suggests that I use a method reference and there's a few examples of this which whichever site so in this example the map is not so useful but let me see if I can find an example of filter let's say we have I find something with a filter so let's say filter I might say what have we got again a key I might say Tiki is not equal to null and and that's kind of fine but IntelliJ suggests to me that I can replace this with a method reference which says objects non null now in this case I much prefer the method reference because I think it says to me very clearly I want you to filter for all the non null and objects so I know that the only object I'm getting in this map operation are not null whereas in the previous version I have to literally it doesn't take very long but I have to reason around what this means like not equal to null or equal to null that means not equal to ones are are giving go through when I'm not going through and there's another example where I really like this as well which is let me see if I can kind of do sorted comparators dot comparing and then I can do something like whoops well if we got on T key I get ID well that's not a great example because I just pull it out out of year but with comparators up comparing you can use a method reference to say this is the method I want you to compare on so instead of having to do comparing like oh one get ID - oh to get ID or was it Oh two - oh one I'm not really sure the comparing with the method reference to me makes it really clear that this is what I want to compare I'm comparing them on ID and also I can do reversed which says okay well then I can really easily sort these centres or thought descending or thought ascending and so that's some of the places where I really like method references I personally prefer method references Overlander expressions most of the time but basically it's going to be whatever seems to be the most readable for you what thank and neck crushing can you provide more details on the benefits of moving to Java 8 result performance and refactorings so I'm going to assume from that question it's like if you're if you've got an existing code base and you're going to run you're going to run it on Java 8 without making any changes so without applying the lambda expressions without applying without using the streams API so basically Java 6 code running on a Java 8 compiler and VM that's well shim now the the benefits that I showed let's go back to beginning or lower back yes here we go those like the third slide and so a lot of these changes that performance improvements in common data structures changes to support concurrency a bunch of these things and again there's links to more information from the page that links to they these speed improvement in particular and you get those without having to do anything particularly because things like hash map performs better so presumably you're already using hash map inside your Java 6 style code and moving to Java 8 hash map would perform generally perform faster without you having to do anything ok next question how will you approach a situation where you need to iterate over a string multiple times to perform different operations on it collect to it to your collection first and then perform their processing or is there a better approach um I'm not sure I would like to see the actual code example but I think that some of these research things that we saw here some of them are a bit magic and you just kind of get three steps in one it just does it all for you and I think that as you're getting to grips with some of the Java 8 features it's it's important to do things incremental II so if I if I was thinking in terms of a collection or some objects requiring multiple operations to be performed upon it I might write that in the code as let's say three separate operations and then I'd have a test to make sure that's working and then I might see if I can we factor it into either a single stream operation or yeah I'd see if I could research it into a single stream operation you should be able to reason around which things are inhibiting that so if I look at some of my examples like this example I know that there's another for loop here so I feel like this should be able to be refracted into a single stream operation but I know that the way that this is structured right now I can't do that because it's this if this map field which I need access to here and here which stops me from inlining this for loop and similarly I have to a for each instead of doing something more Streamy because I'm passing in this set so the problem with this particular method is and I have a number of collections that I'm doing stuff over so I've got this found name thing and I've got this set here I've got this mapped field thing which also has lists inside it so I've got a number of different collections that are interacting with each other so I'm having to iterate over all of them and in this case I know that in order to perform that as a single stream operation I need to get rid of some of those collections or collapse them in together so it would the answer is of course it depends that I'd be tempted on the first pass to write stuff out as a series of individual operations and then see if I can figure out which of those will collapse down into into a single operation okay good sir again a question about comparing methods reference tool and expressions with now performance watch oh um I don't well I think they're the same I don't know I think there is some I think there's some literature from Oracle on that but I think that they perform the same I don't know I shouldn't say anything because I don't I did look it up once I can't remember okay so there were a couple of questions about their performance issues mentioned question first question is is there lower performance is attributed to erase that streams only how about collection not stream right so the worst performance results that we had were definitely around a raised up stream I'm just going to try and see if I can find some of these examples and now so I find the word fun one of the worst examples this is where we're using a raised upstream and regardless of whether we were going parallel or not we just couldn't get performance which got anywhere near the original array and in this case I believe that because iterating over arrayed like I said is very very efficient and it's very difficult to to match that performance but there are other cases as you're pointing out where that's no waste upstream as well so this is a map and a collect and it's not using a raised upstream this is collection upstream and it still is not as good performance wise to look at the code here we go it's just Keys dot stream and so the using streams does have an impact in terms of object creation it does have an impact in terms of performance so it's not it's not just if there is no free lunch of course so yes there is a cost to using streams and directly not just arrays streams but the arrays dot stream is particularly bad which is why I call that out okay and another question rather this is all these performance issue is possible to be solved in future versions of Java or x9 okay yes so the performance of Java 8 has been getting better even even just Java 8 so you know you get all these different updates for the job 8 the non-running release 112 whatever that means so even without having to go to Java 9 we're getting improved performance under the covers for getting improved performance with things like lambda expressions and so forth these streams API one of the nice things about the streams API is because how its implemented is hidden away from you and you just interact with what you want to achieve they can continue to iterate over improving the performance for streams API without impacting you using the streams API they can do that under the colors so that's definitely going on there's all sorts of other stuff coming into Java 9 and Java 10 which will improve the performance - one of the things I do want to emphasize though is that even in a case like this where it looks like this refactored code is really horror performance compared to the original code I did say it earlier but it's worth reiterating micro benchmarks like this don't necessarily mean that your application is going to perform slower if you do this refactoring because in the grand scheme of things your application does a lot of other things it talks to other services and network cords are expensive you know it talks to databases it has to respond to the user there's all sorts of other stuff happening as lots of interactions between things so the fact that these three lines of code or whatever don't perform quite as fast as the original lines of code may not have any impact on the overall performance of your application as well and you might find that some of the improvements around things like garbage collections and the improvements around the disputer some of the collections that you're using they might offset some of these these performance tests which show that the performance isn't quite as good so you need to measure stuff in the overall scope of your whole application not just these micro benchmarks okay one more question what about performance of Flanders and any miss pross's island is bad or good for performance or it depends on how we implement them it seems like it's all about readable code versus better performance it does feel a bit like that the I would say that the world so my example is the lambda of using the lambda expressions let's go back my lambda expressions performed better so in a whole bunch of cases in this report here there's a whole bunch of cases where the lambda expressions perform better than the anonymous types so you get increased readability and better performance I actually had to search through this document to find one example where the lambda expression didn't perform quite as well as one of the anonymous inner types so for lambda expressions in particular they are an Oracle which really hard to make sure that they were no worse in performance than anonymous types and they're often better and as I said the performance of these is getting better as well because of optimizations to the JVM itself yeah thanks um when question hint results or parallel are calculated in isolation would it make a difference if we do do parallel in collection of app I guess then with parallel I might begin consuming or using threads that could have been used for other operations I'm not sure if that is what can I notice I quite understand because if the question is running parallel in production different to running parallel in these tests as I understood the question it is let's say we we use parallel so we need to use leverage other threads and so we wonder if we affect the performance of the application by doing that considering we consumed the thread which could be used for a bit operation I I don't think it works that way maybe I'm misunderstanding the question but what what parallel is doing is under covers it uses the fork/join framework so it will be spinning up if you like its own threads and running the whole point of that parallel is to run the work on individual CPUs or cause and a few physical CPUs so for example on my macbook I can get parallel to use four cores nice and easily and just basically make the most out the hardware that's the point about parallel to make the most of the hardware now of course if you've got parallel threads making the most of all of the CPUs it could be pushing other threads off the CPU so they don't get any time I don't really know how to scheduling work underneath the covers but you're not going to be it's not going to be interfering with data on other thread so it's not going to be reusing other threads and interfering with them from a data point of view it might be feeling the CPU away from them but I don't believe there's gonna you don't have to worry about things like a about thread safety in the way that we traditionally had to yeah I think you are right here and and okay let's move closer oh there's a lot question do does IntelliJ IDEA provide suggestions to refactor from near the time to ja a time API so I could try to answer the question enjoy Delia doesn't suggest it yet there is an issue in issue tracker and we still track roads and feel free to go through it and definitely we can implement that yeah I really wanted to show you some day some time stuff because I really like the new date and time and API but I don't have any really good examples I did see that there's the use of the obsolete obsolete date/time API but that's not the other time that's the java.util date and Java util calendar yes I think that all without the normal questions in case you have other questions feel free to ask them in Twitter or implement we'll be publishing the recording soon and if you'll find some question not addressed please share it with us actually you can always contact me over Twitter cool so thank you for your time hexa I thank you bye
Info
Channel: IntelliJ IDEA by JetBrains
Views: 18,799
Rating: undefined out of 5
Keywords: JetBrains, IntelliJ IDEA, Java, Refactoring
Id: 2xOtyGUTpQU
Channel Id: undefined
Length: 70min 41sec (4241 seconds)
Published: Tue Feb 14 2017
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.