JDK 8: Lessons Learnt With Lambdas and Streams

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
what I'm going to talk about this morning is what they've learnt about lambdas and streams in jdk 8 so let's have a quick show of hands who here is using JDK 8 AHA good excellent who's tried JDK if they're not using in the moment ok good so I'm going to assume that most people here have at least got some experience of JDK 8 lambdas and streams which is very good for me because it saves me having to go through the basics of what lambda expressions are and what the streams API is I've only got half an hour so I've had to sort of shrink this presentation down a bit and I've left out that little bit at the beginning now the idea behind this is really comes down to a Fraser saying that I came across a long time ago no idea who said it but it's very relevant because it says a clever man learns from his mistakes a wise man learns from other peoples so today I am going to be clever you are going to be white now first thing to talk about is how you can use lambda expressions in terms of delayed execution this is a nice little feature that was added in terms of JDK which you might not be aware of and might be very useful in certain situations so here's what we might do in Java now I know we all use the standard logging API in Java we wouldn't use any of those other you know third-party things like log4j or anything like that so this is a typical piece of code that we might have if we were using the standard logging API and this kind of demonstrates Heisenberg's uncertainty principle applied to software and for those of you who are not quantum mechanics majors Heisenberg's uncertainty principle states that the act of observing something changes what you're observing and that actually applies just as well in terms of software the problem we have is if we call this method logo dot finest what we need to do is pass a message to it so that that message can be logged if it's needed and so we have a method called get some status data and that could be a very heavyweight operation could take a lot of time to generate the message and in order to pass the message to the finest method we have to call that method we have to generate the method the message then when that message gets passed to the finest method the finest method will look at it and go off let's see are we at logging level info or are we at logging level finest and decide whether to use the message so regardless of what level you're at you're always going to have to generate the message so the problem is if you set the logging level to info you're going to have a performance impact because you're generating this message that you don't actually use as soon as you get to finest throws it away returns so whatever we do we have to call that we have to generate the message now in JDK 8 there is a functional interface which is called supplier and supplier is a very simple interface it has one method which returns a value doesn't take any parameters it just returns a value and all of the logging API is the standard logging API is in Java have been modified so that in addition to having versions and method that take a string they also have versions of the method that takes a supplier now supplier being a functional interface means we can use a lambda expression in order to represent that functional interface and so what we can do is we can actually change our code very subtly we can literally add what is it like six characters and we can change the way that works so rather than passing the result of get some status data we pass a lambda expression now the lambda expression because it's the supplier doesn't take any parameters and the body of the lambda expression is to call get some status data so it's very small subtle change in terms of what we're doing in our code but it can have a big impact in terms performance because now what we're doing is we're passing a description of how to create that message not the message itself that way when finest gets called and it decides it's a level info so it doesn't need to do anything it returns without doing any work if it needs the message it knows how to create it it calls the method on the supplier generates the message and then logs it so it's only when we need that message that we actually use it so that can definitely improve performance in terms of that type of situation where we're using a logger but nice thing about this is if you have a situation where you're passing a value to a method and you're not sure whether you're going to use it and it's something which has a big performance impact use a supplier pass the description of how to create it and then use that only when you need the actual value in your method so it's a nice way of avoiding the overhead of calling that method to create something every time now next thing I want to talk about is avoiding loops in streams and this is all about how you think when you're writing code so as I did a show of hands and pretty much everybody here is using JDK 8 so you've done some lambdas you've done some streets but here's another interesting question who here would describe themselves as a functional program okay I mean there will be not many people actually interesting and that is actually an issue because lambdas and streams introduces a functional style of programming to Java it's something we haven't had before and for Java programmers we're very used to imperative style of programming and functional programming is different you have to think differently if you look at the differences between functional and imperative programming one of the key things with functional programming is that you shouldn't modify state so when you do things there must be no modification of state it's the idea that function as a mathematical function you can pass in the same value as many times as you want and get the same value out so the the function itself doesn't maintain any state which would affect the result that's being generated by it so that's very important and it's also very important from the point of view of allowing things like streams to work very well if you want them to do things in parallel if you're not modifying state then you can do things in parallel without have to worry about contention locking and all those sorts of things the problem we have is Java developers is that state is very useful you know think about it when you write code you've got variables and you manipulate those variables and that's what we do in Java so having state is very useful telling people they can't use state then has to make them think a bit differently so let's look at some examples of how that works so when I first started using JDK back when it was first released I was working for Oracle at the time and I thought to myself right what I really want is a list of all the places where I can use lambda expressions I want a list of all the places where I can get a stream from because there were a whole bunch of things that we introduced as new API s and nobody had actually documented that there wasn't a nice list that was available so I thought right I'm going to do this myself and I wrote some code which literally scan through all of the Java Docs and looked for all the functional interfaces looked for all of the places where you could use functional interfaces as parameters to methods and all the methods that returned a stream as an object and I thought right so I generated all this information I want to print our summary of this one of the things I wanted to do was to count the number of methods that returned a stream and I thought right I'm using lambdas and streams I'm going to use that to actually write the code and here's the kind of code I came up with but first attempt at my code so I said right I've got a hash map which contains keys which are the classes in the Java standard API s and then the values associated with that is a list of methods that return streams so if I want to count up how many methods there are what I do is I take my key set and I then iterate over that so I thought right I'm going to do that with stream so I'll take the key set create stream from it and then what I want to do is for each of those I want to find out how many methods there are and count them up so I thought I'll use for each and then I'll look at how many methods there are and add them up but I know that I shouldn't be modifying state so what I'll do is I'll use this other class called long adder this is specifically designed introduced in JDK for this sort of situation where you've got many threads which are updating the same variable at the same time but you're doing frequent writes and infrequent reads so the advantage of that is that the long adder will allocate an instance of the variable to each thread so it can update it independently without locking and then when the value is required it combines them all together and returns a result so I thought that's thread safe that's all good I'll do that and I got the result it worked nothing wrong with that code at all and I showed it to one of the engineers at Oracle who is one of the streams engineers you said oh no no no no no no this is wrong you are modifying state this is not functional programming so I thought okay I will have another crack at this then and I went back and I thought right I'll do it this way so now we have proper functional code so in this case what we do is we take the stream rather than doing for each we say take the class that's the key in our set and map it to an int which is the number of methods that return a stream and so we get a value from that as a stream and then we pass those into a terminal operation sum which just adds them all up and gives us a result so in this case we now have functional programming so this is all good but then I had another part of this where I wanted to do two things at the same time so now what I was looking at is the methods that take a parameter which is a functional interface so we can use a lambda expression to represent it but in addition to counting up or in addition to printing out those methods I also wanted to count how many of those were new in JDK 8 so I'm actually trying to do two things at the same time similar approach for my first attempt which was to say okay we'll create our stream we'll use a long adder again we'll do a for each this time we've got a slightly more complex lambda expression and we're going to print out the name of the method and then we're going to test to see whether it's new and then add 1/2 hour long adder so we know this is wrong so we have to do a different way so I thought okay let's try different proach I thought right we need to remove the state from outside of the the stream so I did it this way I thought okay I'll do a map to int same as I did before but this time what I'll do is I'll record a value which is either going to be 0 or 1 for whether it's a new method or not and then I'll print out the name of the method and return whether it's a new method 0 or 1 that way I get a stream of zeros and ones as my stream and I can pass that and some add them all up and get the result so I showed this to my my favorite engineer and it's at all no no no no it's still not functional because although you're not modifying state outside of the lambda expression you shouldn't modify state even inside the lambda expression it's thread-safe but it's still not functional so I thought ok right so I need he'll never go this then so the problem is we've got this state internal state but still state so third attempt so now I felt right what I'll do is I will use another method which is very handy which is peek now peek allows you to look at the elements of the stream as they go past very useful for debugging because you can actually print out what's in the stream so you can see things as they happen but in this case what I'm doing is simply printing out the name of the method so that's all good I've done the printing of the method then I do a map to int and this time I'm just going to use tertiary operator and I'm going to say if it's a new method return 1 else return 0 that way I've got no state involved in here so I thought great I've done it now I'm functional went back showed it to my favorite engineering said yeah it's good but it's not functional I'm like why is it not functional there's no state you said ah well you see the problem you have there is you have a print statement a print statement is a side effect and in addition to not modifying state to be a functional programmer you should not have side effects as well so I kind of looked at that I thought okay so you're saying that I can't print something out but what I actually want to do is print something out so you're saying I can't do what I want to do so I set him okay like I'm now beyond my level of functional programming skills you tell me what do I need to do to this to make it purely functional they said ah you need an IO monad at which point I went I'm good right next the art of reduction and again the need to think differently now when we were at when I was Oracle one of the things we did with the rollout of JDK 8 was to run some workshops where we gave people some exercises in terms of using the streams API and so one of those was a nice simple problem you had a collection of strings in a file and what you wanted to do was find the length of the longest string in that file fairly simple thing we gave people a hint which was that the bufferedreader class now has a new method called lines which returns a stream of strings from the source so if you link a bufferedreader to a file you get the lines in the file as a stream of strings great and the result or the answer to that is actually pretty straightforward what you do is you simply say take your reader create a stream source generate the lines and then pass that in to map to int so you're converting the stream of strings into a stream of lengths of the string so we use the method reference there string length and then we pass that into terminal operation max which gives us the maximum value great so now we're finding the maximum length of a string in our file and then because max returns an optional we need to extract the value from that so we just do get as int good now when we were doing that somebody came up to me he said ah well that's an interesting puzzle but what about if I change that very slyly rather than finding the length of the longest line in the string in the file what about if I just want to find the longest line in a file hmm so I thought about that I thought okay that's that's an interesting sort of thing to do because as we'll see State in that case is very useful so I came up with what I call the naive stream solution I thought right i want the longest line in the file so what I'll do is I'll pass the stream of strings into sort and I'll sort them by length so the longest one is the first in that string then I can pass that in to find first which simply returns the first element in the string that's an optional so I get that result out and I get my longest line in the string so that's great so that works job done right yeah okay well not really because obviously if you've got big files you're going to have to sort them that's going to require a lot of resources get require a lot of time not going to be the most efficient way of doing things so I thought it must be a better approach so the thing that first was I thought right let's go back to basics how would I do this in Java if I'm not using strings and this is exactly what we would do I create a variable called longest it's a string I make it a empty string start with and then I iterate over the strings in my file so I say while there is a line coming from the file and it's not null look at the length of that string if it's greater than the longest string that we already have then change the longest string and continue when we get to the end of that we are going to have the longest string in our file so I look at that I go well that's great you know four lines code and I've done that why would I need to use strings what's the advantage of using streams over using four lines a very simple Java code and then one sure is that this is simple yes but it is inherently serial this is the problem that we have with doing an explicit loop like this is that we can't see we give that to the compiler and say make it parallel so if we had a situation when we had lots of strings maybe in a list rather than a file we couldn't break it up into parallel operations and improve the performance by using multi cores and multi processors even if we did kind of rewrite our loop because we've got state involved it's not thread safe so we've posed another problem in terms of multi-threading so this is the real drawback to doing things this way and why people have been excited about the idea of strings is the simplicity of being able to switch to using parallel approach rather than a serial approach so actually let me go back so one of the things that I then thought is okay so how do you how do you iterate over something and do it in such a way that you don't need to maintain explicit state and this is very good interview question if you're interviewing people on Java or even any sort of programming it's a good question who asked I know this because Google asked it to me when I interviewed with them and the answer is recursion so you can use recursion to an effect maintain state as you go through doing something so I thought right let's write the code using recursion and I wrote a method called find longest string and that took initial string or the longest string we had so far and the source of the strings and then it said write get the next string test if the string is null if it is we've reached the end of our list go back up through the chain the recursion get result if the length is longer than our current longest string then we set longest to the value we have and then we call the method recursively to look at the next string so again I thought right so if we use that I simple way of using it we simply say find along a string pass in an empty string pass in the bufferedreader so that we can get our strings and we get our result so again I thought right no explicit loop no mutable state we're all good finished don't need string so we can do it without streams we can make this parallel unfortunately not no because if you're going to have a very large set of data then you are going to run into problems you are going to hit like an after memory exception you're going to hit Stack Overflow because each line in the file is generating you stack when you call the method recursively too many stack frames yes so there must be a better stream solution this is the whole thing about this is right right let's use strings so the the streams API if you look at it uses a well-known Pat filter Map Reduce been around for a long time well understood and in the case of this particular problem we're not actually worried about doing any filtering so we don't need to reduce the set of data we just want all the strings in our file we don't need to worry about mapping because we got the strings in our file that's what we're interested in so we don't need to bother doing any mapping what we really need to do is reduce so we want to reduce from the set of strings in the file into one string which is the longest string in that file problem we have then is finding the right reproduction find the right reduction in order to do that now if we look at the reduce method in the stream API what you'll find is the signature looks like this so reduce takes a single parameter which is a binary operator of type T called the accumulator and it returns an optional of type T because there could be a situation where you don't have any input and so returning a null wouldn't you know would be rather bad so we returned an optional and if you look at the definition of a binary operator binary operator is a subclass of by function I function is a function that takes two parameters as input and returns a value and in the case of by function the types of the parameters and the return type can be different for a binary operator all of the types are the same so we're dealing with same type objects being passed as parameters same type object being returned so then we say right how do we solve this problem we need to figure out what the accumulator is in order to use the duction and again if you look at the documentation for reduce you will see that it says that the accumulator takes a partial result and the next element in the stream and returns a new partial result and so if you go back to what we did with the recursive approach you'll find that it's a very similar description except this time we're not generating stack frames because we're not doing it explicitly recursively we're using the stream to do that for us in effect the same thing so what we do is we look at that and we find that the answer is to use this type of reduction so we say okay longest line is take the reader create stream lines and then just call reduce and the lambda expression that we pass in is our binary operator we're passing in two values x and y and then we're going to process those and simply say if the length of X is greater than the length of Y return X otherwise return Y so we've got two values and we pass we return the longest of those two values now the key thing here is the value X because X is in effect what maintains the state for us so the stream is doing that so the stream is saying okay let's start at the beginning of the stream look at the string as the first one that's our partial result then compare it to the next one take the longest of those two strings as our new partial result look at the next one do the comparison keep moving down the stream until you get to the end there's no stack frames involved it's not doing it recursively but X is in effect maintaining the state for us which is the kind of key thing there but it's not explicit so we don't have to worry about from our point of view multi-threading if we wanted to make that parallel be very easy we could simply add parallel to the processing of it and then the fork/join framework underneath could split it into multiple threads of operation and all the work will be done for us so I actually wrote a blog post on this and I showed it to my favorite engineer Stuart Marx and I said look look at what I've done gotten through this whole process of understanding how to go from you know finding the longest string in our file and I've come up with this this perfect functional result and he said yes that's that's very nicely you've done very well except there's an easier way of doing it which is used the max up method so you can use max on its own which will take a stream of numbers primitive numbers and it will return the maximum value of that but there's also an overloaded version of that which will take a comparator as a parameter and so you can say take the lines pass it into max where you compare the values based on the length of the string and comparing int is available from the comparator static method on comparator so you can use that and then that will compare the length of the strings by length but return the string itself so you're doing the comparison based on length but still returning the strings as the a string as a value in an optional which you can then get out so this is really just a sort of learning exercise for me and hopefully for you as well to see how you can go through and understand the sort of functional approach to using strings so just to sort of wrap up then the key things here are that lambdas and streams are a very powerful combination it does require you to think a little bit differently this whole idea of avoiding state requires you to think differently to the way that you do in Java normally the idea of how you approach things from a loop perspective is a little bit sort of different and so the key thing here is that if you're writing some coding you using a stream and you think yourself oh I'll use for each in that stream stop and think yourself are now do I really mean that I want to use for each because for each is very easy to use when you think in terms of loops but it might not be the right approach now there are some very valid places where you would use for each if you just want print all of the elements of your stream at the end for each is the perfect thing to do if it wasn't a valid reason for having it then it wouldn't be there so just be careful when you use for each that you're using for the right thing rather than the wrong thing think about reductions think about how you use a binary operator to reduce the stream of data into a sort of set of data or one piece of data be careful with parallel streams I didn't have time to go into this because only in half an hour for this presentation but parallel streams are whole different kind of kettle of fish and there's a longer version of this presentation you can find it on YouTube where I talk more about parallel streams and the way some of it works underneath just to be careful with that because most people seem to think that if you make a stream parallel it will make things go faster that's not always the case one thing I will guarantee is that if you make a stream parallel there will be more work involved in that stream it's just that if you've got a lot of data or something that takes a long time process you may well finish that work more quickly there's more coming into any k9 there's some new api's in terms of the streams which will make things a little bit easier probably new things coming in 10 as well and so I've got about 2 or 3 minutes left so if anybody's got any questions happy to take questions second yeah I'm sorry to come here so the the Luke example yeah yeah oh I see yeah yeah I the idea was the the side-effect of having a sprint line statement yeah as I say you would need something called an IO monad it's I don't think it's necessarily an issue because in the case of what we're doing with that particular piece of code printing things out doesn't really matter I think it's it's only if you were trying to write purely functional code that you would want to replace that but in the in terms of that particular piece of code it's not something that's going to cause any issues because if you did things in a parallel stream it would still work quite happily sorry yes you could in theory replace it but I didn't bother going down that route anybody else one more question there would be significant byte code differences because it would do it differently underneath I haven't looked at the difference in terms of performance I'm tempted to say just off the top of my head I'm tempted to say that the max would be more efficient but I can't guarantee that the only reason I would say that is because I know that the engineers spend a lot of time working on the specific api's to improve the efficiency of them as much as they can so I would tend to think that would be more efficient but I can't guarantee it yeah without looking I mean I might go and look at the code and find that is exactly what it does yes okay so I am at time so come back in like 40 minutes I'm doing a presentation on JDK 9 so see you later thank you you
Info
Channel: SpringDeveloper
Views: 78,315
Rating: 4.9166069 out of 5
Keywords: Web Development (Interest), spring, pivotal, Web Application (Industry) Web Application Framework (Software Genre), Java (Programming Language), Spring Framework, Software Developer (Project Role), Java (Software), Weblogic, IBM WebSphere Application Server (Software), IBM WebSphere (Software), WildFly (Software), JBoss (Venture Funded Company), azul, java 8
Id: wZKmA6XodNE
Channel Id: undefined
Length: 30min 37sec (1837 seconds)
Published: Wed Dec 21 2016
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.