ElixirConf 2019 - High Performance String Processing Scripts in Elixir - Johanna Larsson

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
[Music] this talk is about high-performance train crossing scripts and elixir which is kind of kind of a mouthful but it does sum up a lot of the components in this talk it's mostly about string processing and how to optimize it this good part is that we'll be looking at an underperforming script and make small incremental changes to it and see how that changes the execution time and also some other factors so I don't know about you but optimisation I just really loved optimizing things I just make love making things go faster and I know we talked a lot about premature optimization you know this is the root of all evil we've heard that I think today and sometimes though sometimes you get these legitimate cases where something isn't working the way it should and you need to make it faster this is one of those times and besides doesn't really matter this is a conference we should have some fun right we should indulge yourselves a little bit yeah I don't know about you I this just this thing where you set up a benchmark and you run your code and you get a number and the number tells you how good you are and then and then you just make a change and you run it again and then sometimes it's it doesn't really make it faster or it even makes it slower and you've learned something that's good but sometimes that number is lower and that's just amazing you get this like rush of dopamine and yeah it's a good feeling my name is Jana I am a co-organizer of the man elixir Meetup that's my twitter and I get up i work for a company called castle we do protection against account takeovers credential stuffing we do security workflows automation all kinds of stuff and we've got our headquarters in San Francisco but we also have offices in mom in Sweden and Krakow in Poland and we are hiring have to say that so let's talk about you the audience you've at least been through the official documentation you have an idea of what it looks or looks like and maybe you've come across a couple of those places where elixir doesn't perform to the degree that you expected it might be performing worse than other languages with similar similar code and we'll be talking about about that so this is the agenda first of all I have to start with some I think if you're doing a talk about optimization you have to do some caveats you have to say you know actually this is not good you have to think about a lot of stuff so I'm gonna like check that off to start and then we can stop caring about it and then I'm gonna give you an introduction to the challenge the script that we'll be working on and then we'll be going in three different directions the first one try to just preserve the original script but make small different small changes to it make it faster and then we're going to take a look at a different direction where we use more memory to make it faster which is not always an option and then finally we're gonna go concurrent because this is elixir so you should go concurrent so I'm sure you've seen this quote it's by Joe Armstrong I think it's from reeling in OTP in action 100% sure though and I see it as sort of a guiding light when it comes to optimization when you're working you're building something you know focus on building something good first and then actually take a look at once you have it running once you have it working take a look at do you have to do you have any bottlenecks if you have bottlenecks that's the time to fix it so like I said I want to talk a little bit about you know how did the optimization and this isn't the perfect method but it's a couple of things to remember to keep in mind while you're working and they're not in any particular order so I think it's it's very important to set a goal if you have something that runs in a hundred seconds and that's too slow for you how how fast does it need to be is 10 seconds fast enough free is one second fast enough for you if you don't are you going there's no way you're gonna get there and you need to define a method whether that's a benchmark you can have like a micro benchmark those can be unreliable but easy to use if you're lucky you can test it in production but that's dangerous and then you have to consider that you have to keep in mind the context and by that I mean how much are you willing to pay to speed this up if you can even throw resources that are problem but it like using twice as much resources often doesn't make it twice as fast so how much are you willing to pay for 10% more speed and again using more memory can be a way to improve performance but if you don't have more memory that's not a ravine option so let's let's talk about the challenge itself this talk started as a post on the forum and it was about an underperforming script and a bunch of people helped out we're looking at it Joe say was looking at it and there was just something about the script that really caught me and I ended up spending so much time just modifying and changing things I started reimplemented core functions to see if I can make them faster I just spent so much time on the script that I ended up thinking you know I have all this information I should put it somewhere so I turned that into an article and then it's a talk now so the the script originally came from a blog post from it was in German that Google Translate did a pretty good job of translating it and the task is to read from standard in and this input is ASCII it's about 1.8 million lines separated by spaces and new lines nothing complicated and you read from standard then you split that into words then you count the frequency of those words you sort them and you pretty print the output shouldn't be too difficult so this blog post came with some example implementations in different languages and I tried I ran them all on my machine to compare and I think the C version is interesting because we can look at it like a baseline that's as fast as it's gonna get I don't think this necessarily is the best possible C version that has ever been written but we can use this as an indication of K we should be probably within an order of magnitude of the C version that's reasonable we probably want to be faster than that but yeah and we can also see that this actually took something like 80 lines of code while the the article said that it should be something like 10 so it helps I also had a Python version which eight seconds that's pretty impressive to me eight seconds is super fast it had a ruby version which I mean but still in the defense of Rivi doubt this is fine 70 seconds is fine it depends on how often you write you're running the script you might need to do it and see that's not really the point but it's 17 seconds is kind of reasonable to me use a lot more memory I'm not really sure why and then we have the elixir version so this this is one that I wrote but it's very similar to the one from the the forum post and I think this is this is pretty good elixir if use the streams it uses enum it has this gorgeous beautiful pipeline like all the way through that's that's aesthetically pleasing so let's walk through it we start by creating a stream of lines we stream that flat map and stream got split to turn the stream of lines into a stream of words then we reduce over that stream with a map and updated to keep track of the count of each word is then sorted by the count and preview print it and this takes 120 seconds in elixir so why is elixir so slow let's let's do audience participation I'm gonna name the different steps as I see them and you can raise your hand what do you think it's spending its time it's fine grease in multiple times so okay first reading no one okay splitting really no one that's not really spending its time okay right to the code you're gonna go with flat mapping all right yeah so yeah again so you got reading split splitting yeah we had some hands okay counting counting of the frequency of the words people feel like that's the one how about sorting some people like that one too okay and then finally printing it some hands let's see how your intuition does this is where it's been time now if you're paying a lot of attention to these numbers they don't actually add up to one hundred and twenty seconds because I wanted to split it by the different steps as I see them and this was using streams which of course interleaves the function calls I had to get rid of the street so I just turned it into a list and that made it a lot faster about 30 seconds faster and that's how we'll be most likely because of the overhead of streams just say is referred to it as the stream tax you have to pay it if you're using it they have lots of other great properties but they can slow your code down but yeah we have here we see that reading took a little bit of time spend some time reading because it's reading lines and that's of course more effort than just reading blindly and it spent a lot of time splitting and counting so we know that those are the areas we need to do work on so let's look at the luck story the fast parts that's the cool part but first just a quick interlude so if you're talking about elixir scripts what often comes up is the startup time you know it's not really fair to compare we can see that Python has a ridiculous startup time that number doesn't make sense Ruby a bit more reasonable and elixir is slower I think if you just keep running it it it goes down to like 500 milliseconds it's not crazy and it definitely doesn't explain 120 seconds and we also should just briefly talk about runtime performance versus developer performance if you hear you you can you enjoy the value of writing less and getting more from your code so when we're talking about performance string processing we're not talking about beading see we're talking about having it at a reasonable execution time so first Direction that's not so it looks through data structures are immutable and this is a this is a great thing I love immutable data structures they eliminate a whole class of bugs they're excellent for concurrent programming but they do have some caveats they work in most cases they're great sensible default but if you put too much data into them they slow down and that's what we've been doing now like I mentioned this was 1.8 million lines I think it's something like 500 thousand unique words that map does get slow but when a map is not fast enough you've got options so there's ETS Erlang term storage it has super fast reads and writes they're constant time meaning they just doesn't get slower as you fill up the table it's not garbage collected or anything and this is what the easiest version looks like it didn't really change a lot except you now have that glaring ugly hole in the middle of our beautiful pipeline the first part is creating the table we don't really need to pass any options to default you're fine and we now use ETS update counter which has a slightly arcane syntax with these tuples and then finally tapped a list to bring it back as a list from the table that we can then sort and print and this makes it three times faster just replacing the map with an ATS table made it three times faster so lesson maps are great but they don't really scale with size so next let's talk about string splitting that was the other big one string that split is this amazing function that just does what you want it looks it just get splits on any white space and by default gets rid of empty strings but if you think about it if we're splitting on every single white space we're wasting a lot of effort because we know that we only have spaces and new lines to care about so in most languages you would look to regular expressions and here's a simple regular expression with a sigil that just looks at spaces in new lines and this makes it slower it makes it a lot slower so regular expression the engines are different in different languages some of them are really fast v8 has a super fast one go not that fast and elixir or Erlang not that fast either but that's not really a problem for us because we've got another option we have patterns so as the second argument to split you can pass a string or a list of strings of things that you want to split on and that does make it faster we're now down to 30 seconds and we are we proved our hypothesis we could make it faster by reducing the white space that we split on you can improve this a little bit further by using binary compiled pattern ironically you can't actually compile the pattern at compile time you have to do it at run time and that shakes off another second not huge but it's something and then finally we're going to look at Unicode so Unicode support is not relevant for our script and elixir by default cares about Unicode and that has a certain overhead that comes with it probably another a lot of the other implementations that in the other language they didn't actually bother with the Unicode support but again it doesn't matter here so instead of screen we can use bin stream which is Unicode unsafe and that's also considerable speed up just by dropping that so we're down to 22 seconds and the code looks like this now again it's not that different apart from the unspeakable thing in the middle and here's the breakdown we've improved the difficult areas splitting is down by half reading as well accounting is it looks really nice at 6 seconds and yeah this is pretty nice I mean it yes it's really performing well here but we can do better so let's go in a different direction let's see what happens if we use more memory to make it faster and so streams are great when it comes to memory efficiency because you can produce values sort of lazily and handle them and then they're released to be garbage collected but calling split 1.8 million times this cell is a lot slower than calling it once with 1.8 million times larger string so we can we can do something different and again like I mentioned before getting rid of the stream itself makes everything faster so here we use iota bin stream again but we pass the chunk size instead of a line and then we immediately turn it into a string this is faster than using i/o dot read because the i/o to read has an all option but it uses a fairly small chunk size so this is much faster and again we just split it and we don't even have to care about setting up a pattern anymore because explaining a single time is so efficient and we're down to 16 seconds so we beat Ruby but we are using more memory than Ruby and we should talk about our lists ire lists are this really cool thing so a lot of virtual machines have this built-in optimization for string concatenation and the basic problem is that if you're contaminating strings let's say you have three you first have to concatenate the two first which gives you an intermediate one and then you can catenate the intermediate one with the third one and you get and your final results you have to allocate five strings to get your result and what for example v8 does in the situation is that it doesn't actually make an intermediate string it instead makes a list of all the strings that you try to concatenate and then if you at some point try to look at the string it turns that into a string efficiently java has an opt-in way of doing this with string buffer but from what I understand the JVM can also do this magically behind the scenes in elixir we do have the option of opting in to this behavior so there's a function called Erlang IO listed binary which turns a list of strings into a single string efficiently and furthermore a lot of functions that are concerned io they can take io lists natively so I outputs if you if you pass a list of strings to our outputs it actually just prints a single string you can even Ness these lists arbitrarily and you can actually see this Jason for example the Jason library it has an encode to i/o data option where it turns it into a nihilist instead of a single string and Phoenix by default uses this if you do plug register before send you can inspect the payload before it's sensit and it's just this huge list of lists of lists of strengths and this gets us down to 13 seconds so this is pretty good this is uh this is as far as I got basically for this and this is what it looks like now again they're not a lot of changes we've mostly just replaced lines with other things it has some added complexity because it introduces new concepts but it the flow of the code what you're doing is about the same as before and here's the breakdown now we don't really have any terrible parts anymore we still spend a lot of time splitting and counting but those are them they don't look as bad as they did before but we can we can do better than this we're using elixir we should be using concurrency to solve this problem the the concurrency story and elixir is is great we should be using it so we can start with some simple concurrency with the flow library by plataform attack it actually and if you look at the documentation for flow one of the examples is counting the frequency of words so this seems like the right thing to do and it gets a little bit longer not much you have to pass some options to the at stable public so that other process or processes are allowed to write to it and write concurrency to optimize multiple processes writing at the same time and then this is the part we change which basically we would replace stream or enum functions with flow otherwise it's the same as before and this again is a like the first version because if you want to split work up you have to have parts of work if you just make one huge string you don't have parts to split up and this runs in 13 seconds so it's as good as the final version memory and efficient version we had and it's kind of memory inefficient too and it also makes your computer very warm it uses it uses all of my course for 13 seconds so I think flow is a great library but it might be overkill for this specific script but I mean realistically if we're in first saying that elixir is great for concurrency the core library should have something for this and it does in the form of tasks async stream so this version of the script was originally written by Avenue she made this amazing version of this script and I stripped it down a little bit to make it easier to show and if it's ugly or wrong that's my fault so this is a bit longer and it's actually a module to reduce some repetition but I'll step through it you know part by part so we have this initial initial stuff with the table and pattern and we create our input stream and the new thing is that we create a task function that is that gets run by async stream and this async stream just like flow automatically parcels to work out onto as many processes as you have schedulers but by default but you can also change that so what we need to do is if we look at the task function is we need to split the the input that which is chunk in this case I split it by words and then we have to call process words and what process words does we'll be looking at that a bit later but what's interesting here is that we're working with chunks and not lines and a chunk doesn't know where line starts or ends so we have to keep track of the beginning and end of every chunk so that we can glue them back together later so next step is or sorry processing words so this let's start from the bottom this is the final case there are no more words and we return there's the prefix and the suffix as a tuple so the e sync stream basically the one that we just looked at it turns the stream from chunks into prefixes and suffixes and the the first case is where there is no prefix and there's we grab the first item from the list of words and we keep it as the this prefix and then we do the same thing for suffix if it's the last item in the list and then finally any other word we can just count it and then now that we've done that we're at the next part we have to actually glue those words together which we do by transforming the stream here at the end this is pretty pretty straightforward I just glued them back together and then we have the final part where you finish up the work which looks about the same as before and this finishes in seven seconds so that's pretty good that's pretty impressive and the memory usage is not that bad either it's a lot more work than the original version it's not ten lines anymore but I still feel like I probably wouldn't want to write this in Python or Ruby so to do our quick recap we talked about optimization we worked through the examples with small incremental changes we talked about these changes what they did and how the effect of the execution time we looked at some alternatives to the standard tools we explored trade-offs like memory usage and finally applied concurrency to the problem so the takeaways elixir has sensible defaults we've got Unicode safety by default we've got immutable data structures by default as great defaults but when they're not enough there are options and that's my talk thank you you
Info
Channel: ElixirConf
Views: 3,213
Rating: 4.8888888 out of 5
Keywords:
Id: Y83p_VsvRFA
Channel Id: undefined
Length: 26min 43sec (1603 seconds)
Published: Fri Aug 30 2019
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.