GopherCon Europe 2023: Jonathan Amsterdam - A Fast Structured Logging Package

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

[Music] thanks everybody I'm really excited to be here I'm Jonathan from the go team at Google and I'm going to talk to you about fast structured logging and in case you're wondering oh you actually can't see the background on the screen but uh if on the actual slides if you look at the PDF that'll become available there's a like an old Captain's Log from a ship in the background so um so in go 121 we're going to have this new package called slog that does structured logging I'll give you a very quick overview of slog and then I'm going to talk about why we felt we needed to make it fast and then how we made it fast first a bit of slog so it's a structured logger so that means there are key value pairs it outputs key value pairs which I'll call attributes throughout this talk and those are great because you can write programs to easily filter or change or aggregate those those logs unlike the built-in log package slog also supports levels and it supports changing output format by installing a Handler I'll have a lot more to say about handlers but the two built-in handlers give you these outputs uh key equal value format and Json so here's some slog code right here in the middle we're constructing a new logger using the text Handler writing to standard out and then we log something at the info level with the message processed item and then three alternating keys and values and the output is as I said key equals value now that's fine and we expect most people will use that API but if you want to do it quickly if you want speed use this log adders method instead which takes a few more arguments and it uses these Adder things which are the key and value together and I'll explain in detail why that's faster but we have you construct them by type and some people prefer this anyway because you don't get confused which argument is the key and which is a value and the output's the same so that's all you actually need to know about slog for this talk um but you can always go to pkg.co.dev I I had up there in the X repo golang.org X slash X slash slog you can find um the experimental version um and you can also if you go to uh at Master uh you know log slash log at Master you should be able to see the version that's not released in 121 yet but but will be and 121 I think is due out around August so why did we feel we had to make slog fast well here's one answer uh logging is often in the Inner Loop servers do thousands of requests per minute each request can generate thousands of log messages each log message can take significant time to process that's not a great answer because that doesn't why couldn't we just write it in good clean go and maybe it'll be fast enough so to answer the real answer is we have to look at the history of structured logging and go so the first one nine years ago was lagras um and uh that's still very popular even though it's no longer maintained still 99 000 importers but it's relatively slow in 2017 the engineers at Uber noticed that logging was consuming a significant amount of their server CPU so they wrote a faster logger called zap and zap is also quite popular and much much faster than logress and that spurred the community to go even faster so there's a whole bunch of vloggers that are faster than Zapp a zero log probably being the most popular of those although it's not nearly as popular as that um and so based on this we decided to aim for zap level performance because we felt we captured the majority of people's use cases since that was still quite popular in logariths too and all we did look at trying to get to the level of zero log but when we look carefully at the zero log API which is Lovely by the way we couldn't quite see how to do it without and keep it safe zero log actually has this subtle use after free bug and we couldn't figure out how to get rid of that now you may wonder how can you have a use after free bug in a language that has garbage collection automatic memory management and I will answer that question later so that's why we made it fast how did we make it fast well we followed this five-step program so there's five five steps here and the first is probably the most important it's no what you want to make fast you can't make everything fast you have to pick the common cases and how do you know what the common cases are you have to look at what's out there that does similar stuff and analyze it and people often say you know why don't you add all this stuff to the standard library now instead of waiting 10 years why did you wait 10 years because now we can look at what's out there and design the right thing for the community um and part of that was this discussion I mentioned I'm only going to mention it briefly now but yesterday at the panel I talked at length about our our process for coming up with the slog API that was really valuable for features and API but when it came to actually performance like actual numbers uh the best thing we could do was to look at what was out there so I searched through a lot of the popular log package importers and I aggregated some information so I'm going to show you two graphs that relate to to zap so here you see the cumulative distribution of the number of attributes key value pairs in calls to that so on the slide you see an example of a zap call to the info method that takes three attributes so that's three and I just did a cumulative distribution here and you can see something pretty striking that once you get to five you've pretty much covered 95 percent and adding more doesn't change that very much of course it is a long tail but um five is sort of this magic number where you've captured almost all the calls I was also curious about these uh these types like we saw zapdot string zap.int uh zap has like 32 of those and generally in the standard Library we don't have 32 of anything so I wanted to see if I could get that number down and indeed we see a similar picture where once you get to about eight you're at 95 of everything um so slog only has almost The Identical Constructors but only about eight of them so now that we've looked at the data we can write benchmarks that capture our expectations so here's a typical slog benchmark the first thing to note is I'm using this run parallel call to the to the Benchmark code and go and that's because you typically see logging in a highly concurrent environment servers typically start up a go routine per request they process many requests simultaneously so we wanted to understand the performance when all those concurrent things are happening we focused on calls with five attributes we found that was our sweet spot right and we used common attributes based on that second graph I just showed you um we also tested for more arguments just to make sure there was no performance Cliff we looked at up to 40 arguments just to make sure we weren't doing something horrible there um and uh here's a general tip you should always write a call in a benchmark as it would appear in user code it's very tempting if you're doing five arguments ten arguments 20 Arguments for your arguments to make slices of those and to make a loop and do all the stuff you usually do when you write code but that would be a mistake here because if you made a slice of these five things and passed it in with dot dot dot the compiler would generate different code and you get different numbers wrong numbers since most people just write it in line like this so write your benchmark calls the way users actually would use your API last but not least here at the bottom of the slide where none of you can see it there is the link to this wonderful program called bench stat that other people on the go team have written if you're doing any sort of performance thing where you run some benchmarks then you make changes then you run benchmarks again this will take those two sets of benchmarks and compare them tell you the percent speed up or slow down and also tell you whether that's just statistically significant and whether you have to gather more samples it's a really great tool that I recommend to anyone who's doing these sort of performance work okay so I'm going to switch now to the topic of API so this is the architecture of slog there's the user-facing logger thing and the API for that I'm going to talk very little about actually what I want to focus on in this section is the back end the Handler which is the thing that the logger calls logger takes your info call or whatever bundles everything up into a record and gives it to the Handler and the Handler can do whatever it wants Handler is an interface so you can write your own you can use the built-in ones and it can do whatever it wants so you might expect from this that Handler would have one method called handle that takes a record and indeed it does but it also has a few other methods that are there solely for optimization and I'll just talk about these two on the screen so many if not most log lines never actually get executed they never actually produce any output because they're there for debugging and you only turn them on when you really need to so those should take as little time as possible and that's what the enabled method is about the first thing that slog does that the logger does is called the handler.enable method down here and if that returns false we're out we don't do any anything else so minimal processing so that's an important and somewhat obvious optimization the next one with adders is a little more interesting so what we notice by looking at all this code is that people build their log lines in two phases uh almost all the loggers have this feature and this is the way it looks in slog you can take a logger and you can say dot with and then you can give a bunch of attributes and those become part of the logger and will be in every log call that you make with that logger so here we're taking L1 adding a request attribute to get L2 and then doing some logging with L2 maybe twice or maybe a thousand times and each call will include all the things that you write in this info line Like A or B but it will also include request now converting that that our argument the request let's say it's an HTTP request a pretty big thing converting that to a sequence of bytes might be rather expensive so if we could pre-format if we could do that conversion once when with is called instead of every time when you actually log um then we could get a big performance speed up and that's an observations app made and so I basically stole this and many other things from slog from zap thank you zap um so what we do is the logger with method immediately calls the Handler with adders method so the Handler can optimize this these attributes by pre-formatting them it doesn't have to it could just save them and then use them later as part of the regular logging process if it doesn't want to do that work but at least has the opportunity to do that now these last two points I have to insert a warning here I'm going to talk about low level optimizations and everything I'm going to tell you is wrong for almost all the time you're writing code you should only do these things when you need it I will repeat that several times in case there are new go programmers in the audience I'm going to show you these are Advanced performance techniques not normal coding so the first is about avoiding indirect calls by which I mean a call through an interface or through a pointer now that is a little bit slower because there's that level of indirection but more importantly it squashes optimizations that the compiler can perform because the compiler can't see exactly what code you are calling at the call site so the compiler can't inline that code which saves you the function call but more interestingly enables further optimizations because now all this code is in one place and the compiler can think about it and it also disables Escape analysis which I'm going to talk in detail about in a minute but which has to do with memory allocation so for these reasons logger is a concrete type it's a pointer to a struct and not an interface now that's just a very vague idea well I'm going to get precise about exactly how that saves allocations in a second so the rest of the talk is about avoiding memory allocation we'll go over several scenarios and I'll show you how we can avoid memory allocation in each one by memory allocation I mean Heap allocation the other kind stack allocation is very fast to allocate something on a stack you increment and register to free it you decrement to register Heap allocation will never be that fast even though ghost Heap allocation has improved over time could never be that fast so Heap allocation is always slow relative to stack allocation and furthermore it creates garbage and the garbage collector has to spend time collecting that garbage by the way I want to mention a talk yesterday that was about memory usage here I'm talking about memory usage indirectly allocation as it affects time not its effect space so who decides whether something goes on the stack or the Heap the compiler decides that by doing the static analysis called Escape analysis it looks to see if say an argument to a function call is ever ever appears outside that function like does the function put it in a global or does it return it or something and if it doesn't if it doesn't escape that function then the compiler can allocate that thing on the stack otherwise it has to keep allocate it and of course if it can't see that function's code the called functions code because it's indirect then it doesn't know and it has to assume the worst case so it Heap allocates it now it's very hard to predict what the scape analysis will do it's very clever gets better all the time but you can use this Dash m flag to at least see the decisions it's making it'll tell you for every variable in your program whether it escapes particular function or not and once you've optimized your code so you've got it down to three allocations you want to lock that in so subtle changes to your code during further development don't mess that up so you should use testing.alox Perron in your tests it'll tell you how many allocations happen and you can say this has to be less than three less than or equal to three that's a really useful technique for making sure that future work doesn't mess up your performance optimization so let's look at a few scenarios we'll start with variatic arguments those dot dot arguments here's a really simple function that just adds up its very attic argument and to that function that very atom variatic argument appears as a slice so where is that slice allocated well if you can see that this argument ends does not Escape it's just used in the function but it's not put into a global or returned or anything like that and the compiler can see that too so when you call some directly that slice is put on the stack there's no Heap allocation but if you call it indirectly through a pointer or an interface the compiler will put that Slice on the Heap so let's get back to the whole business about logger being a concrete type here's the whole story logger's a concrete type the log Adder method takes a variatic argument these adders okay so this call is a direct call so whatever you put there that turns into a slice but that slice can be on the stack because log adders is carefully written to not let that thing Escape and that's true even though the first thing that sorry I went too far yeah the first thing that logoutters does is called is does an indirect call Handler is an interface so handler.enables is an indirect call but that doesn't bother Escape analysis because we're not passing that very attic argument to that call it doesn't affect the analysis of that argument so if logo were an interface which a lot of people asked for because it results in a really nice design where anyone can Implement anything they want if logo were an interface we would be allocating that slice even if the call were disabled which is bad and that's why it's a Concrete type now let's that's the uh start of the logging process calling log adders and at the other end we are the Handler is taking all the stuff you gave it and transforming it into a sequence of bytes for output so let's talk about how to optimize that the first code block here is absolutely what you should do almost all the time do not go home and say jba said to do all this other fancy stuff no do this the first thing unless you have a performance issue so what's wrong with this not wrong what's slow with this um is that you continue as you continually append to it you will grow that bite slice it will have to reallocate to about twice the size you'll copy everything over and depending on how many appends you do and how much space you use you can do that repeatedly in the second code block we get that down to one allocation by making a size big enough for our common case and again you have to experiment to see what that common case is but this is actually the number we use in the built-in handlers one case seems to cover most log lines um and now all these depends will not do any additional allocation you can get it down to zero allocations with a sync.pool a sync.pool is a manual memory management technique someone asked yesterday about manual memory management go has it in sync.pool you can put and get things into this pool and if you put something and then you get it you're getting the same one there's no new allocation that happens um so if you use that then yes there's some initial allocation as you build up a little pool of things but in the steady state everyone's just getting something that someone else put into it and there's no new allocation now we're going to talk about the scenario of passing something in this case the record um to an interface method that's an indirect call so if we used a pointer to a record that would have to get Heap allocated because the compiler couldn't see into the call um and so the Escape analysis wouldn't work it would have to keep allocate the record and anyway we want to allow handlers to cause that record to escape a perfectly good Handler takes the record and puts it on a channel for asynchronous processing so even if the compiler could see into it there are handlers where the record would Escape so how do we avoid allocating a record we just copy it we pass it by value which means the record can't be too big or we'll spend a lot of time copying now inside the record among other things there's got to be a slice of adders because users can provide as many attributes as they want in the log call and if we just use the plane slice which again is the absolutely the right thing to do in almost all cases just do that the thing that's crossed out just do that but if speed is critical we can do a little better we can't use a sync pool for records though we decided not to you can but it's a bad idea for us why because users have control of Records Handler writers can get their hands on a record and we'd have to ask them to free it or try to free it magically at some some point but either way they could get themselves in trouble by freeing it twice or holding on to one after they free it that's a use after free bug and that's why zero log was problematic for us zero log as as part of its speed exposes a pool allocated value to users if you use zero log the normal way that you'll see in all the examples you will never encounter a problem but if you do something a little out of the ordinary you can get used after three bucks and we just didn't want to put that in the standard Library so sing pool is not available to us so what did we do we put a little array in the record of size n and that array holds the first and adders no allocation it's just there as part of the record and then we use a slice if there's more than n so what is n if it's too small we'll allocate on a lot of log calls and if it's too large we'll have this large record that has to get copied well we know the answer to this question because we did the research and it's 5. we uh we know that if n is five we're going to capture about 95 of all login calls that seems perfectly good okay last but not least these uh adders are just key value pairs that's what an Adder looks like it's very simple the key is a string but the value is not an empty interface or an any it's this special type slog.value and that's the last and maybe the cleverest of allocation optimization that we do so when you construct a say a string attribute the key is just the key and the value we call the string value function to turn an ordinary string into a capital V value so y value we're going to trade space for allocations to understand that you have to understand how things are represented in go this is an interface two machine words the first is a pointer it's always a pointer the second is some type information this is how strings are represented also two machine words the first is appointed to the actual data the bytes and the second is the length when you put them together like when you assign a string into an any you get this picture two arrows two allocations now the string allocation you is in your program anyway you have to do that that doesn't count but this extra allocation when you put the string in the any is what we're going to avoid well let's look inside a value a value has an any in it but it also has an extra machine word called num values are like these log values are like reflect dot value there's a kind that's a little enum small integers that tells you what kind of value is in there okay so now I'll show you how we represent some simple things and then we'll get to string so in N64 the number is just the number cast to a un64 and we use that whole any field just to hold this small integer now I just got through telling you that everything in any any value in any is a pointer so aren't we allocating a pointer to that small integer no because Indigo runtime there's like a thousand or so there's a there's a an array of the first thousand or so integers and when you construct an any from a small integer you get a pointer into that array so yes it's a pointer but there's no allocation at runtime that's a static array built into the runtime and we do the same thing for floats and a few other types nothing here is very interesting there's a nice function called math float64 bits that converts our flow to a u in 64. and we use the kind field the any field for kind so now let's look at strength so here's what we do the length of the string goes into that num field and the any field holds the string data now this is function in unsafe called string data that just gives you a pointer to the raw bytes um and it's unsafe because if you play with those bites you can get yourself in trouble but we're not going to play with them we're just going to store them in this anything and we're going to use our own internal type just so we know that we actually have the string data in there so a value then is these two fields are the any one is a pointer to the raw bytes and one is that type and then we have the length so we've gotten rid of that extra allocation because again this thing is already part of your program we've traded one machine word to get rid of that allocation so when you put a string into a slog value there's no extra allocation we also do something clever with time the time type in a similar vein that's all I have for you I just want to summarize so the design of slog was informed by almost a decade of prior art about what the go Community developed for structured logging so we waited a long time but it was worth it because we got a lot of information about what people really want same for performance you can always make arguments about why something needs to be fast like oh it's in the Inner Loop but until you look at what the community actually does you don't really have a good sense of how fast so we're able to use that to pick our performance goals the third point is about designing a back end like Handler whenever you have an API where there's a user-facing part and a back-end part make the back end part more complicated to enable optimizations because remember relatively few of your users are going to be writing that back end most are just going to use the front end part so we made the Handler API a little more complicated to enable optimizations and last but not least if you avoid allocations you can often save a lot of time thank you very much [Applause]

Info

Channel: GopherCon Europe

Views: 2,292

Rating: undefined out of 5

Keywords: #gopherconEU, #golang

Id: tC4Jt3i62ns

Channel Id: undefined

Length: 25min 37sec (1537 seconds)

Published: Fri Aug 25 2023