CppCon 2018: Tom Poole “Why and How to Roll Your Own std::function Implementation”

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
- [Tom] Hello everybody, I'm Tom Poole and today I'm gonna talk to you a bit about some alternate std function implementations. Say the main motivation for this work is, or at least the main motivation for this talk, is some work I did a while ago which was adding a std function replacement to the JUCE framework, which is an opensource cross-platform effort. I'm not just saying that to name drop the product that I'm working on. There are a couple of aspects of JUCE that means that a std function replacement is kind of uniquely valuable. These two things are that JUCE has quite broad cross-platform compatibility requirements. It's also focused on audio applications. So we'll get to those two in a minute, but first it's worth just giving a quick std function recap. So in a single sentence, std function is a type-erasing wrapper that holds something that is callable with a specific signature and so the things that are callable in C++ are function pointers, objects with a call operator, and lambda functions. There's just a quick example of each of these on the left hand side. We have function, we have some struct which has an implication operator so we can call it and down at the bottom, we have a lambda function. Each of these can be assigned to a std function and that's good because we can treat them all on an equal footing. So why is that useful? I think it's fair to say that std function really has changed the way a lot of programs are written, not programs in their entirety but certainly bits of them. If you're interacting with anything that is asynchronous, before the advent of std function, the typical approach is that you'd call an asynchronous function and you'd have to provide it with a function pointer and maybe some void star data, to then get control back to when you started from and the basic process would be, you'd save your this pointer into the void star data, then when you get a call back, because your asynchronous function has either finished or it says an error or something, you then cast this void star pointer back to the, in an instance of yourself, and you're back where you started from. You can monkey around with all of your internal data again. With std function, you can simply put this in the capture list and everything is much more contained. You don't have these global functions littering your code base where the only point of them is to get you back where you started from. And std function is also a really good fit for a lot of audio processing operations which brings us back into what I actually used it for in JUCE. So you can think of almost all audio effects as being successive transformations on arrays of audio samples. So each of these transformations has the same signature. You have some input samples, you do something with it and then your output is the output samples. And because of all these transformations have the same signature, swap-able arrays of callable objects is a really convenient structure that you can use to have a way of changing behavior dynamically. Or another example would be, something like an oscillator class. An oscillator class does something periodic and the user can provide a std function to determine what the wave function would look like in that periodic bit which is then swap-able at runtime. The std function is really useful and people who use JUCE have been using it to get things done and so this is important because JUCE supports a lot of systems and as you get really old, on the desktop side, or really weird on the embedded side, then you run into situations where actually the standard library lacking a std function is a reasonably common occurrence particularly in the tool chains for embedded systems, you might lack a std library all together or it would be made by someone who is much more concerned with constraining it in terms of size, rather than having all the functionality available. And also if you want to target OS X older than 10.7, you'll find that you are free to use a really modern compiler but you are limited to the standard library that's available on that platform which gives us most of C++ 11 but std function is the real stand out that the JUCE users really wanted. And the other aspect of why std function is particularly important when you're working with audio, so using with JUCE is that JUCE works with audio and processing live audio is extremely unforgiving. So the process of dealing with audio starts out with, you do some communication with the audio device in your system to say, okay, you tell it, I want to do something with audio and you do that by supplying it with a call back. This call back has at least in its arguments a pointer to some memory and the size. Once you've told the audio device about this, at some point in the future, it's going to call that call back. And once it's done that, you have a short window of time where you fill that block of memory with some audio data and then a short time after that, that memory will be read by the sound card. The crucial thing here is that if you don't finish writing to that memory, you will hear a glitch. This is not really like graphics applications where you're just blasting frames to the screen. If you drop a frame, unless you're looking out for it, you probably won't notice but if you drop a single sample, you will hear it and it will sound terrible. And so why is that bad? Well if your audio software produces glitches, then people will not buy it and this is not particularly because musicians are fickle or anything like that, but if you're broadcasting live, or even doing something quasi-live like recording a once in a lifetime take, if your audio glitches during that, the whole recording is affectively ruined. And even worse, if you are doing your bit of audio processing towards the end of a signal chain on a really loud sound system, then you can actually cause permanent hearing damage to anyone listening to it. So because of that, there really is one golden rule of audio programming and it is, if you're working on the audio thread, you do not do anything where you don't know ahead of time exactly how long it's going to take. And if you're looking at audio thread code, two things stand out as big red flags, the first is any locks and the second is any memory allocation. So locks are bad because, well on this MacBook at least, the audio thread has a higher priority than any other, so as a programmer, you cannot create another thread that has the same priority. If you're locking, it is almost certainly because you want to communicate with another thread and as soon as that thread is locked by a lower priority thread, you have affectively demoted the priority of the audio thread to the one that is sharing the lock and as soon as you do that, the OS can just decide not to schedule that thread for whatever reason and suddenly you miss your deadline and you hear a glitch. Memory allocations are bad, first and foremost, because they take quite a long time. We're generally working in a very time constrained demand. We want everything to happen as fast as possible. The really nasty thing about them is that they can take an unpredictable amount of time, so every now and again, your fetch from main memory will take much longer than it used to and this again can push you over your deadline and you hear a glitch. Why is this a problem when you're working with a std function? Well, std function allocates sometimes. We'll talk about this a little later in the talk but std function can avoid allocations by using a small bit of stack space in each object, it's called small buffer optimization and the same thing happens in std string. So if the thing you are storing in your std function is smaller than the stack space, then the stack space would be used and you wouldn't have allocation. If it's bigger than that stack space, then we need to fetch some memory from the heap to hold it. And the annoying thing is that, the size of the stack space is implementation defined so short of going and reading the code in the standard library that you're using, there's no way to tell ahead of time what this stack size is going to be. And if you're going to compile the same code with different compilers, then all bets are off. We are not the only people worried about these similar concerns, so the GameDev and low latency working group has something called inplace function. So inplace function is a std function alternative that is guaranteed that it will never allocate. It essentially uses this small buffer for everything and that's the direction that this talk is going to go in. But also, whilst we're kicking std function, Facebook's Folly library has a couple of other improvements. First off, it is a move-only alternative and I think this makes a lot of sense. If you think about how you use std function, it is almost always to define some functionality in one place and then pass it off somewhere else. If you're just going to define something and use it, you can use a lambda function, it would be much more efficient, and it's very rare that you really want to copy the functionality that you've defined. And the other thing it also addresses is the fact that it is very easy to violate const-correctness using std function. So a std function's call operator is marked as const, so if someone gives you a const reference to a std function, you can invoke something. However, once you've invoked that something, that something is free to modify it's own internals and as soon as you do that, all guarantees about const-correctnesses are out of the window. Your compiler will not be able to help you. So let's dive in. So what do we need if we're going to implement a std function? At the very high level we have a function class that is a template and once we know what the types of the things that it takes as arguments and the type of the thing that produces as a result, we can specialize that template. Note that here we don't have any information about the type of the thing that this function is going to hold, it's just the signature. So we're going to do some type-erasing. This is the approach that lib std C++ uses and it's a set of type arrays function pointers. So on the left hand side here, you can see that we have some type array storage, just an array of characters and we have some function pointers where the types of all these function pointers, is a sea of void, void stars, results type and your arguments. And in tandem with this, you have a set of type-aware functions. Now these functions are type-aware because they have a template type and this functor is going to be the type of the thing that we are storing. And so because we know this type, we can invoke the thing, we can copy construct this thing into a specific destination and we can destroy it. So the constructor is where all the action happens because this is the only place in the entire class where we actually know the type of the thing that we are going to wrap, and once we know that, we produce some specializations of each of those type-aware functions. When the compiler see this, it's going to go and stamp that out that out somewhere. If we've got a pointer to it, we simply cast this type-aware pointer down into our type arrays-ed pointers and then we're free to use them, so in the constructor, we give it an address of the storage we're using, same with the destructor, and same with the invoke. These are all just operating on type-aware memory, that these type arrays pointers are actually pointing to type-aware functions. So that's the nuts and bolts of how this thing actually operates but to make something useful, we want to put these things in containers. There's no point making a type arrays thing if we can't treat all of the things that we contained on the same footing. So for that, there's not too much interesting here. We just note that we now need to know the storage size as well because if we're assigning from a different function, we need to know the size of the thing that it contains so we can make our internal storage large enough to take it. So we've done all that. We want to know how this thing is going to perform so we've got a very simple benchmark here templated on this function type. This function type is gonna first be a std function and then it's going to be our replacement. We have an array of these things in a default state. Each one of them, we're going to assign it to be a function pointer and then we're just going to loop over them all again and produce a sum. And so, we see how we did, pretty terribly. So I had this guy in the greeny-blue is std function and the other side is our alternate. This shouldn't be too much of a surprise. If we jump back a couple of slides, then we can see every time we're calling a constructor or every time we're doing an assignment, we're having to reallocate our internal storage and calling out to main memory is very slow so the fact that std function is beating us comprehensively isn't much of a surprise. So how's it doing this was, as I previously mentioned, if they thing that you are storing is small and in this case we're just storing a function pointer, then std function is free to avoid an allocation by storing this directly in some stack space in itself. It's annoying because this is, as a non-configurable implementation defined size but if we're going to be building a replacement for std function, we want something that operates similarly so our users are not going to be surprised and it's pretty easy to add in. Don't worry about reading most of that, it's not particularly interesting, the interesting bits are down here where we have some stack space. If you haven't seen std aligned storage before, it's essentially a very thin wrapper around an array of characters. It's nicer than an array of characters because it has some same defaults for alignment on the platform that your running on. That makes more semantic sense. You don't need an array of characters, you're not gonna print it, you're not gonna read it. This gives you more a generic storage type which just more clearly indicates, this is just an area of stack that we want to use. I have a extra storage pointer here and this storage pointer is either going to be null for the default state, it's going to point to the heap if we're exceeding our stack size or it's going to point into ourselves to the stack. And so you see in the constructor, if the size of our functor is small enough, we use our internal storage and if it's too big, we use our external storage. And you can see on the right hand side that this, there's nothing difficult going on here but you can see the increase in complexity of this function when we have to decide where we're going to store our internals. This could be dried, so this is essentially the destructor and you could write this in terms of a swap function but I just wanted to really list all the operations that you would need to do to make this work. So if we benchmark this against std function, then we're doing more or less alright. We're a little bit slower but we haven't put any effort into optimizing this whatsoever, so again that is not surprising at all. So if we look at the size of our alternative implementation, we see we're pretty chunky. We have three function pointers, we've got some stack space, we've got an int for the heap size and we have an extra pointer as well. Now if you look at the lib std C++ implementation, it deals with this in a much more clever way. So the invoke pointer is in a union with the stack, the create pointer, destroy pointer, and some other functionality are all bundled into a type arrays management function, which does all of those operations. It just takes an extra argument which is effectively an op code telling it what it needs to do and the heap size and the storage pointer will abstract it away. But we're not gonna do that. We're just going to take a step back. If we look at what this code is doing, we have effectively written our own V table. A much nicer way to deal with this is just to use a V table. So this thing, this is how most people are more comfortable with doing type-eraser, so on the left hand side, we have a type arrays base class which is doing a similar kind of thing. So we have an invocation operator and we have a couple of methods for copying memory around, and we have a derived class that actually holds a concrete implementation of the thing that we are trying to wrap, this functor here. And we have one of that. Dealing it with it is very easy. We can just call it so we'll copy construct it into certain places. And actually the internals of our overall function wrapper are then much more simple. We just have some stack space and we have this function pointer, this base class function pointer which again is he's going to point onto the heap or into ourselves. And if we do this, then we effectively recover, with no extra effort, we've ended up being exactly the same performance as the std function. This is the std function from lib Cpp if anyone's interested. So now we have a std function replacement that is working at the same speed. What are we gonna do with it? So this is the point where someone may throw a spear out of the audience and I have to dodge it because adding declarations to the std namespace is undefined behavior but we've done it anyway. I know, I can hear that. (laughs) So this is dangerous because there's an unfortunate coupling between the std namespace and the compilers. The compilers are free to look inside there and if it sees something called std function, it at least reserves the right to make assumptions about the memory layout or how you operate with this thing. So we have taken some precautions, so we have a whole sea of pre-processor checks to make sure that we are running on a platform that doesn't already have a std function. Some other things that work in out favor is that, JUCE is distributed as source code so whilst it's perfectly possible to compile JUCE as something like a D-L-L and try and pass one of these things into it or out of it, if you're on a system which doesn't otherwise know about std function, it's very difficult to know what you're going to do with it on the other side. And because of this, it's been out in the wild for a couple of years and we've had no ABI issues or any other that we know about. So that's all well and good but it hasn't really solved the problem of this allocation and being safe to use on an audio thread and this is the simplest way of solving that problem. All we do is when we construct one of these things, if the size of the thing that we want to wrap, is greater than the size of the stack, we would just fail to compile. This gives us a real cast iron guarantee that when you use one of these things, it's not going to allocate. And another benefit of doing this, is that the internals of all our things, like copy constructor or assignments, become much more simple because we don't need to worry about where our internal functor is stored, like it could be on the heap or it could be on the stack, now it is always on the stack, there's much less branching. What this translates to is that actually we can get a pretty significant speed up over std function, so again, this is a pretty targeted benchmark so you are unlikely to see a 30% speed increase just from switching to this but nevertheless, it's notable. So we can do that and we can get a bit of extra runtime performance but in terms of our memory management, we are now doing terribly so there's a pretty good argument to making your std function replacements have a total size of either 32 or 64 bits so it lines up nicely with the cache lines on typical systems that JUCE users use. But if you do that, particularly if you make it 64 bites and you only store function pointers into it, most of that stack space is going to be unused so, the typical way of getting out of this jam, is something that std vector use, is just you add the ability to plug an allocator into things. And when you do that, you affectively delegate your allocation behavior to the allocator. So pre C++ 11 allocators were extremely awkward. They got a little bit better in C++ 11 and even better ones will be available in C++ 17 but we are firmly anchored to C++ 11 and even though they have improved, using them is really rather cumbersome and actually if you look around at a lot of audio codes, it's very rare that you see anyone doing anything with custom allocators but if you were, the general strategy would be that your allocator would allocate a big chunk of memory up front in a non-CRISP performance critical bit of your application, and then rather than dialing out to main memory when you need a bit of chunk, you just dip into this pre allocated pool. There's a slight performance overhead but it is nowhere near the cost of dialing out to main memory. A much better approach was presented by Louis Dionne at last years CppCon talk, where he was advocating really the ability to use things like the small buffer optimization in almost all C++ objects. It's a much more comprehensive approach, it's super powerful and if you've not watch the talk, you definitely should because it is really inspiring. However, you can get most of the benefit of using such an approach with much less effort and you can do that by just having more indirection. So now this function class here, it's a bit of a misnomer so you would not be able to use this function in the same ways you would a std function. So what happens here is this is a virtual base class and all the action, everything that I've showed you, has just been shifted into derived classes, but the crucial thing here is that these derived classes are free to change the signature, so you can have a templated stack size. Now if you know the sizes of all the things that you are going to put in your std function wrappers at the time that you construct them, then you actually have optimal memory management this way. And you can still put all of these things in a list or in a container because you can use pointers or references to the base class. Benchmarking this against std function is more tricky though because you can use them in different ways. If you had a container of std functions, you would store the std functions in that container but with this approach, the lifetimes are slightly different. You'd store the base class pointers in a container but the actual objects, these things, need to live elsewhere. Nevertheless, if you do shoehorn them into the same benchmark, you will see that the results are a little ambiguous anyway. Changing the C++ standard, changing the compiler, changing the standard library you're comparing to will all produce different results, and the upshot is sometimes std function is slightly faster in this approach, sometimes this approach is slightly faster. It probably evens out about the same. But this approach, you are guaranteed to not allocate. There is however, a fairly large elephant in the room and that is, what if you just don't do any type-eraser? So this is the world's least useful function wrapper in that it can only take function pointers and the only reason that we've created this is to put it into exactly the same benchmark as everything else went through. And if we do that, it's pretty conclusive. Because the compiler can see into the type, it can work out that this entire function always returns exactly the same number and we've squashed down to a single instruction, just return this single value. So there is a pretty massive caveat. If you are doing type-eraser, then you are telling the compiler that the thing behind this type-eraser, you can't know what that is. This going to be a runtime property and that can be a pretty significant performance killer, so the caveat is that you really should only use a type-erased callable wrapper if you really don't know the memory layabout of the callables that you need to store or you really need to store different callables in the same container. There are certain user cases where you need to do that but if you can not do it, then you will probably have a faster program. And a related note is that if you see a std function in a function signature, then typically if you instead use a templated type and then just call that type, your code will run faster. This is because again, the compiler can see the actual type of the thing that you are using and so you can get things like inlining. Of course, this does mean that the contract of this function has been blown wide open because there's no constraint on arguments or return types or anything like that, but often the overhead of having really horrible error messages when you try to compile is worse or at least that's more tolerable than having a slower program. So to conclude, rolling your own version of std function is not particularly difficult. We've talked about all the real pain points. My code is available and actually if you take a hatchet to your favorite standard library, working out what they do isn't too difficult either. And once you've done that, you can replace the internal memory management of std function with your own scheme and this lets you do things like enforce realtime safety at compile time by guaranteeing that you're not going to allocate. You can get a modest increase in runtime performance or you could do something much more adventurous, but the really important caveat is that, if you're able to restrict your callable object to those with a certain memory layout or type, then do not use a type-erasing wrapper. So that is it, there's a load of information there and I'll take any question. (audience applause) - [Man] Do you know the S-G 14 inplace function work? Is it the same, similar approach to your-- - I'm really struggling to hear you. - [Man] What? - I'm struggling to hear you. Can you get a bit closer to the mic? - [Man] Sorry. - (laughs) Sorry, I'm really struggling to hear you. - [Man] Okay, so do you know how the S-G 14 inplace function work? - I think it works in very much a similar fashion to what we've displayed there. (man speaking without a mic) - Okay, so it's effectively this thing but by itself, without a base class. Oh sorry, and the session is over, I overran a bit. (audience applause)
Info
Channel: CppCon
Views: 6,983
Rating: undefined out of 5
Keywords: Tom Poole, CppCon 2018, Computer Science (Field), + C (Programming Language), Bash Films, conference video recording services, conference recording services, nationwide conference recording services, conference videography services, conference video recording, conference filming services, conference services, conference recording, conference live streaming, event videographers, capture presentation slides, record presentation slides, event video recording
Id: VY83afAJUIg
Channel Id: undefined
Length: 31min 53sec (1913 seconds)
Published: Fri Nov 02 2018
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.