CppCon 2018: Kris Jusiak “State Machines Battlefield - Naive vs STL vs Boost”

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments

That library looks promising, but I am a little worried about the maturity and support provided. Looking at the issue list it seems that the author more or less ignores any issue raised.

👍︎︎ 2 👤︎︎ u/pklait 📅︎︎ Oct 09 2018 🗫︎ replies
Captions
- So let me tell you a story why I started to care about state machines and try to fix some misconceptions about them and that's where we'll start here today. So, about few years, I was an intern in a company. It was a Christmas period time. I was ready to go back home and I noticed that a lot of developers were really hard working and I asked them, it's like Christmas period, why do you work so hard right now? And they replied that they have a contract going the country which doesn't have holidays in the Christmas so they had to be on call and they had to begin into the problems, possible problems which they may encounter. So, I came back from Christmas, I was an intern. I was working, right? So, when I go back, I ask them, it's like, how did it go? They said, well, Kris, we had a lot of issues and specifically one which it took us all the Christmas to sort out. So they didn't go back home, they were really annoyed and it was bad and the problem was that, just let me just explain the system they were working on. The system was very simple, it was just having some inputs, manipulates them and had some outputs. So not the rocket science but the problem was that they haven't had proper practices and they found that they didn't reset the Boolean variable in some switch, in some loop, in some form and that was introduced by the bad merge request and the conflict. So, I learned quite a few things out of that. The first one, well, I didn't work for that company. They worked for during the Christmas time. The second one, I learned that, well, they didn't have a good practices because they didn't test it. The way they found the bug, it was by reviewing the code so they didn't have any good practices. So, again I didn't want to work for that company but the main problem I learned is that they didn't have the declarative way of expressing the application flow and before, they had to compare different solutions using the code review. The diff between the working and staging instead of being able to figure out upfront what was going on. So, I asked them, why you didn't use the state machines, for example. Well, I decided the performance issues and other issues which we didn't want to deal with. So, this talk is about how we can actually clarify those misconceptions and hopefully spread the word about the state machines. So, let's begin. But first, let's introduce a problem. So, if you have a healthy company, not the one, as the one I was working on, we will get the requirements from our product owner, or for example, client. Let's do that in a TDD, BDD style. So we have a feature connection. We have a scenario, establish connection, very simple, TCP approach. So given I don't have a connection when I receive a request to connect then I should try to establish the connection when I got an acknowledgement. I should be connected. Very simple and easy way of defining the requirements. We have obviously more the requirements than that. How can we actually implement that guy? Well, let's take a look into state machines. Obviously, there are different ways of doing that but we don't consider them in that. We will talk about Unified Modeling Language. Is everyone familiar with that guy? Yeah, more or less everyone. So, it's a standard, it's specified in OMG.org. You can read it. I think it's even worse than the C++ standard to read so I don't advise you to do that but you can. There a lot of features specified it in it so that's useful. So our features, scenarios, we can actually translate into the state machine and how does it look like? So, I guess everyone have seen some pictures like that. We have an initial state, disconnected, we have an event connect and we and we call the establish action and we transit to the connection state, connecting state and if you have, for example, in connected, we have the ping event, if the guard which is in the square bracket is valid, we call the reset timeout and we stayed in that state. Otherwise, we don't call there is a timeout and we don't do the transition. Makes sense? Easy to follow. Everyone knows that. So, today, we'll try to implement that guy many different ways. So, we'll take a look into naive solutions. By naive, I mean C++98 features. If, else, switch, enum, inheritance, state pattern. We'll take a look into STL and std::variant, might be used to implement state machines. Coroutines might be used to implement state machines and we'll also take a look into the Boost libraries. Oops. Let's first introduce some common implementation. The common across all the solutions so that we don't have to repeat ourselves all the time. So, the event will be just simple structs. We can start from that although they may have data as well but we don't really care about it in our case. We have a guard, one guard is valid. It returns true by default. It takes an event. Nothing special about it. We have an action which will just print to the output. We'll use put just in order to see in the Godbolt whether it's inline or not, otherwise, we would have a lot of code from SCDL derivative streams which we don't really care about. Sorry about that. So, let's take a look into a naive solutions. So, the first one will be there if, else. We'll start doing it from implanted a connection class. We'll have obviously free Booleans to represent our state. The state will be implicit because they'll be implemented in the code. By default, the disconnected state is true, other ones are false. And when in process event which is the connect event, we verify with the UI. In disconnected state if we are, we call establish which is the action and after that, we have to reset all the states because that would you learn from the story in the beginning that you always have to reset all the states just in case because if you have really nested version, you might actually forget about it and the transition is just set in the connecting state to true. What about processing the ping? We get the connected. We verify whether we're in the connected state. We check whether the guard is valid and we set the time, timeout and we stay in the current state. Makes sense, right? So what about the Godbolt? As you notice on the left side on the bottom, there are 64 lines of code. Not bad, not a lot of boilerplate and everything is very well inline and optimized with flux. You can see on the top. So, that's good. And why is that optimized? Well, if anything else, it's really easy for compilers to see it through. So that's a great thing about the if, else, right? So it's inlined in GCC and in Clang. I just want Clang but GCC produced the same output. There's no heap usage. That's great, we don't have to care about shared point as unique points or anything like that. There's a smallish memory footprint, however, it's not ideal because we have three Booleans. As a state machine, we can only be in one state at the given time so that's a waste. It's hard to reuse and to illustrate that. Let's see, it's so hard to reuse that I guess everyone have seen the code when you have multiple lifts, switches and everything is nested, it's like how much can you go? I don't know. It depends on your, I guess, it depends how many spaces use you've used for your taps, I don't know. We can go quite far. But, yeah, obviously, it's hard to reuse because it's hard to implement those guys and maintain. So doesn't the best solution. Let's take a look into a different one which is still very easy and everyone probably have seen it. Switch and enum. Everyone's love it, I guess because it's so popular. You can see it in any legacy code base. So, it's good for the state machines from the perspective that only one state might be active at the given time so we use enum. So, it's pretty much the same as the previous solutions. Instead of Boolean variables, we use the enum state with the three disconnected connected and connected states. Disconnected is the default one. When we process the event, instead of if, we do the switch to handle current state. By default, we break and if you don't now default main is just enough. Label so it might be put anywhere in the switch. In case of disconnected, we call establish. We change the state, we break. Nothing really spectacular here. And the same case for the ping. If you are in connected, we verify their, the even is valid, we set the timeout and we break. So far so good. What about the performance of that guy? Again, compile is already good in optimizing simple things and switches, one of the main constructions of C++ in which compile is very good at. So we have 65 lines of code pretty similar to if, else and everything is optimized very well. So that's great. But what about the switch, what's bad about it? Well, it's inlined, that's good. We fix the issue of the memory footprint. It's one byte, that's fantastic. There's no heap usage again. It's still hard to reuse. We still have a lot of nested switches and ifs and force and everything like that. It will be hard to add a new state and to maintain that. So, in a sense like, you code for six minutes and debug for six hours. Even with the good test coverage, it might be difficult to find those bugs. So, yeah, maybe that's not the best solution either. Let's try a different one. We all know that object-oriented design is great, right? We can use short pointers everywhere, throw them away. So yeah, let's do it because we care about the performance so, so yeah, we'll see. So we have our state which is an interface and with a lot of process event for all the events we can handle. And after that, we implement it per state, on per state basis. That's the thing which actually gives us a lot of, a lot of goodness because we can implement a state transition, we can add a state and transitions to it without changing the connection itself. So that's valuable. It means that a lot of teams can work on different transitions and combine them together. So that's positive. So how do we do that? We have a disconnected state we inherit from the state, we have the constructor, well, we have the connections we have to pass through and in case of connect, notice that I use final here because that'll be important for the devirtualizations, possible devirtualizations. And we change the state to connecting. It's pretty much the same implementation as we've seen before, just a bit different way. The same about the ping. We have to implement here the ping and probably also other events to satisfy our requirements from the UML state machine which we described before but all in all, it's very simple. Sorry. It's playing with me. What about the performance? Well, we have a bit more lines of code because, well, it's object-oriented design. We write a lot of code, boilerplate code just to satisfy the design. And unfortunately, we have the vtable. It wasn't devirtualized although we had final. We have for Clang and GCC, we have around 220 lines. Not ideal. Not perfect. And I'm using the newest Clang and the newest GCC here and all the examples are all in the Godbolt, you can check them out yourself and experiment if you want. So, what's good about the state pattern? As I said, it's easy to reuse an extend because we can add the states quite easily by just adding a state inherit from the interface and just add your transitions you want. That's cool. It's object-oriented. That's what we want from it. Does highish memory footprint because we have to allocate everything on the heap which we have to use and we have dynamic allocations all over the place, probably shared pointers as unique pointers, all the goodness. But because of that, it's not inlined. It's not even devirtualized in our case. so that's bad because we care about the performance. So although this guy is pretty good from the developer perspective, it's not really good from the production perspective because it doesn't give us what we really need. So okay, so we've come through the naive solution quite quickly. Nothing really new about it. Let's take a look into something else. It's like if I have to, if I could pick one thing which C++ is really good for, I'd say it's the writing libraries. And one of the best libraries available is STL. I hope you agree with that. And since C++17 and C++20, we have at least two new ways of implementing state machines. one of them is std::variant. Everyone is using std::variant? Who likes it? Yeah, it seems like half half of the crowd. I think it's very powerful idea. Maybe we'll see how powerful it might be used, how powerful it might be used for the state machines. So here, let's take a look into connection. We go back to the one connection class instead of having it per state. And right now, the states are implemented using just types and struts, which is very powerful for us because they can have data which we couldn't have before when we used enum, for example or a Boolean variable. So keep in mind that although we won't have an example here, we can have the data for the states which may have a different stuff in it. So disconnected may not have anything but when we connected, we can have an IP, specific IP which will be just active and available for us to use when we are in that state because variant might be in one given state as well. So that might be very handy for us. And we declared the variant as the state with the disconnected by default. How do we actually handle the transitions? Well, we have to do the visit on it. Overload is just an idea and it's proposed for C++20 to choose which lambda to call so we don't have to implement a lot of boilerplate to satisfy that. We could use the generic one and use if constexpr, for example, if you want but overload seems to be cleaner in that case. So if you get the disconnected, it means that we ended disconnected state and we got the connect event, we do the same. I guess all of you can see the similarity to previous solutions. So we call establish and we change the state to connecting. The next case is just to satisfy the compiler because it's generated compile time so we have to call it for all the events, all the states possible. So in case of connecting and connected, we just discard those guys because we don't care about them in that transition. Makes sense? So, what about the ping? In the ping, we have actually very similar but even an easier way of doing it. We can just verify whether in connected states using get_if which is the part of the interface of std::variant. Verify whether the guard is valid and satisfied and there is a timeout being called afterwards. We stayed in the same state so there is no issue here. What about the performance? From the Godbolt and the newest and greatest STD leap and Clang 7, we actually get it be inlined. So that's fantastic. It's not as good for GCC. GCC doesn't inline it at all. It's 140 lines of generated assembly with mysterious vtable being there. We have 70 lines of code to implement that so all of the implementation is very similar to the previous revisions. As you could probably see, it's very similar to switch and enum. Instead of enums, with just use types which gives us RAII and a lot of benefits. So, if you sum up the variant, it's quite small memory footprint because we use the struct, the struct is one byte. So if you start the variant, we serve the max size of all the states which by default would be let's say one byte in our case because we just have them to struct plus we have to store something to, to know which one is actually active so that depends on the compile implementation. That might be one byte, eights bytes, whatever. But the footprint will be quite small and it'll be really efficient because we can use those structs to add the data to it. So, for example, we can have an IDE or last ping or whatever we need for those guys. So that's handy. Also, variant integrates very well with the expected and static exceptions, ideas because we can actually return an error which is just part of the variant or expected type with some error case which we couldn't do before. I will be talking here is like we have exceptions in disabled in our cases just for the performance reasons but error handling is quite important and handling the errors in enum or switch, it's not ideal. We probably would end up from an exceptions or something like that or using the return code or global variables or something like that. With variant, we actually don't have to do that, we can return something unexpected which is nice, which is good. And it's inlined on Clang. That's good. It's not inlined on GCC, I guess that's bad but yeah, there is still room for improvement and it's quite a decent solution for the simple state machines. It doesn't offer a lot of capabilities besides simple transitions but it's still very useful and having the efficient memory footprint is quite neat. It's quite hard to reuse though because it's very similar to the switch and enum approach so we have to, you know, if you have nested state machines like calculus state machines or composite state machine, we'll have to have a visit inside the visit inside the visit. You see the pattern. It's not much better than if else, if else, if else, right? Okay. So let's get to the good part. And the newest part. We can actually use in C++20. So, coroutines. Coroutines are a new way of implementing state machines as well in C++20. Has anyone experienced the coroutines so far? Tried them out? Yeah, quite a few hand. Not a lot of hands, actually. So yeah, coroutines. We'll try to dig it a bit more into them in a second. So, notice that in our case, the connection won't be as tracked anymore. It's not that type, it's a function. So that's the first difference between coroutines, when using coroutines as a state machine because before, because coroutines are functions. They're called resumable functions and we can use them basically as a functions. So notice on the line, the second line, we have the for, infinite for loop. That's odd, right? Why we would have infinite for loop? Well, we need that guy because we can actually use the coroutines to resume and suspend and that's where the co_await is for and if you don't get an event which we do want, we want to come back, do the loop and suspend again in the co_await. It's a different way of thinking but, you know, you can think of it like the state is the position in the functions when we, in the function code in a sense where we at. So here, on the third line, we have the co_await, we wait on the event. If we get it and the event is the connect we do establish and we go further which means that we'll be in the further in the function. If we don't get event which we satisfied, we do the loop and co_await will suspend again and wait for it to be resumed by, by a process event call. Notice that I'm using here a single further application. These coroutines are often associated with asynchronous ideas but you can use them in the synchronous way very efficiently as well. So there's nothing about being here like multi-threaded environment or something like that. It could be, could be the case but it's not in our case. So that's very important. So you can use coroutines in a single threaded way. So right now, we do a second loop because we want to be a bit further in the function. We never come back to the first loop, again, in our case, for example, because we transit a bit further and right now, we stay here because when we are resumed, we are resumed from the co_await which is on the second line on the second part. We won't be resumed before that guy. So I hope, I'm trying to make it clear but it's like it's really hard to think of it if you don't know what the coroutines are. We are going to be resumed from the co_await call in that place. We won't be resumed anywhere else. So we do the loop because we have to verify whether we only established, whether we got an established event or not. I hope that makes sense. There's a question? The question is what is the end label for? And that what you asked, we figure out in a sec because we need it for that guy. So, as you see, we have a lot of for loops and the best way to get out of the nested for loop is to, is goto, right? So I hope that answering a question that to get to the end. It's a different way of thinking but you have to think of it it's like we have dysfunctions, we are in the position at some point in that function, we are resumed from the point, the co_await was being called and after that, we react on the event and all this hassle with the continue break and goto is just to go to the different for loop part of the function because we always stayed in that function. So yeah, it is possible with the coroutines. That specific condition is quite difficult but as was mentioned, we use goto so maybe. States are really implicit here, by the way. As you notice, I pointed them with arrows because we don't really need to know the state because the state is represented by where we added the function which is like intriguing, right? It's not really what we actually think of states but maybe, maybe we would like to take a look into that a bit further. So what about the performance? We have more code. There's more boilerplate code to implement by default because we don't have all these facilities. Coroutines are implemented in the core language but we will have quite a bit of hassle to satisfy the interfaces. You can check it at Godbolt and should try it out. The interface is quite difficult to deal with. And therefore, the different proposals have to deal with it. However, it wasn't optimized very well. Although coroutines can be optimized especially yield might be optimized very well than the generators. The co_await seems not and we play for the heap allocations. Not great. So to sum up the coroutines. What is it good about it? The structured code is actually really, really neat, right? We can use ifs, else, force and before loop not especially in the way we would like to but we can still use them. So, everyone who is new to the language doesn't have to learn libraries or anything like that. They may just use the language and just have to understand the co_await will resume you at at the point you were suspended. Pretty cool. As I pointed out, you can use it in a synchronous environment but you can also use it in a synchronous environment very easily which is not really a case with an older solution which I will be presented here. There's is a learning curve to it, definitely. It's different way of thinking so yeah, you have to get used to it. Might be worth it. Requires a heap. That might be not the case in the future. I don't know, I just checked out the Clang implementation because GCC doesn't support coroutines. I'm sorry, I didn't bother with Visual Studio but I wouldn't expect it to be faster than Clang implementation. Even Gary said that as we see that most experimental in Clank is doing it right. So, it wasn't allowed it, unfortunate. We have these implicit states which is good and bad, it depends how you look at it. Depending on where we are in the function, we know whether we end the given state or not, that's good or bad. I don't know, it depends. And the co_await retains a type so it has to be a common type, sort of, right? Because we cannot return different types unless we use variant something like that so that's not ideal either. And I have to admit that these for loops are quite weird. Okay, so let's just do the goto just for the fun of it. Because we can so it's pretty much the same example instead of having the for loops, we need the first for loop because the state machine has to go forever. But besides that, we don't actually need them. Instead, we can have labels if you really want. Disconnected label means disconnected state. Connecting label is a connecting state. Gonna connect, yeah. And basically, after that we go to the connected state, everything is just described by the label. And after that, we can do goto, which seamlessly quite even more need when, you know, that's the way we exit the loops if you have really nested ones and we don't wanna do the checking of the Booleans. Well, that's a good solution, I don't know but it's definitely valuable solutions with coroutines. It seems like a function. Changing the position in the function might be used with the goto. Please don't quote me on that that I'm not saying they goto is the solution here. I'm just saying there's an option. You can explore it yourself and yeah. So if it comes to performance, still the same amount of code basically. We still have allocations didn't help with that. We didn't expect the goto to help with allocations though. I hope. So what about the summary? We don't have these infinite loops anymore. Pretty cool, we didn't like them. We have explicit states which we can treat as a positive or negative depends whether we liked it before or not. And we have the goto and they story about the goto, it's you never know where you end up. Maybe goto it's not the best idea. One more different variant we can use. We can use coroutines with functions and variant. So if you think about the disconnected state as a function, we can actually come back to our for infinite loops, sorry about that. And just have the state being represented as a function, separate functions and use the co_return to go to the different state which is quite handy because we can separate the functions. It's pretty much the same for any other event, state transition we want. So you have a connected state which uses the for loop for the same reasons as before and we just, There's question? Right, so the question is where is the in? So, those guys are part of the connected class which has an in. Sorry about it. So yeah, we actually in the a bit higher scope. We could possibly pass the in into the, our functions but yeah, it's basically we have a connection class which has an in as a variable and we can use those guys like that. So yeah, that's very neat. Performance, didn't help, actually made it worse because the variant, we use the variant to have the different types of the return of co_await because we couldn't deal with their common type so we wanted to have something more flexible but it didn't help. So the functions, we can use them quite easily. We can add a new state behavior. We can we have types of events and it's fancy. It's a bit like Apple new headphones. They can easily get lost and you can easily get lost with coroutines, I guess. So what about the Boost? So, we talked about libraries. STL is great, Boost is maybe even better sometimes. Sometimes, it's not. So let's compare a few other solutions. We'll take a look into Statechart, MSM which are available in Boost. You can download them, use them right now and SML. SML is the library I'm the author of and we'll just take a look. It's not a Boost library. It just called Boost to confuse everyone. No, the reason is it is aimed to be a Boost library. Okay, sorry about that. So, let's take a look into Statechart. Has anyone used Statechart? Not a lot of you guys. So let's take a look how we can actually implement a state machine in Statechart. So we have to change our events to inherit from the Statechart event which is a CRTP thing. Well, not the best start. We have to do some more boilerplate code but you can deal with that. Actions, unfortunately Statechart is more like the state pattern in the object-oriented kind of design so we have to inherit from the state machine. The first argument parameter here is the CRTP thing and the other one is the initial state so you have to have a bit of knowledge how to use that guy and all the actions have to be put into their connection implementations just in order to to be usable by the Statechart. So as in the state pattern, we implement that using the per state basis. So we have a simple state which we have to inherit from, we passed the type you have, the connection we use and we have the reactions and the reactions are basically the list of transitions. So you have to be able to read it. So is a transition from the disconnected state on a connect event to the connecting state? Connection is the class and we have the establish event so it's basically like before. We've see then other cases, it's just more less robust, less easy to follow, I'd say. We can also have a custom reaction. So on the bottom on the connected, we have the ping which is a custom reaction which allows us to do the guard in and call in the, it allows us to do anything we want with the Statechart and the transitions. And we have to discard this event because we don't transit, otherwise, we will just call the transit on it. Performance. 60 lines of code and 2,000, almost 3,000 line of generated code. Well, not great. So to sum up, it has a lot of features though UML related which we don't even cover here so that's good. Is there a learning curve? That's not great. It has dynamic allocations, dispatch, high memory footprint. So they introduced MSM into Boost which is much faster. However, it's really based on macros. There's a different prominence for MSM which is, I'm using this one because I find it the best to be more expressive and that's what I really care about, the expressiveness and the performance. So you have the event which you have to define with using macro. There's another macro to define the state. There's another macro to define actions. We have to pass all these parameter around for guards. Guard is an action, just returns Boolean variable. It's quite a lot of boilerplate but we do that, all of that boilerplate is just for this guy. And this guy is awesome. It's a transition table in which we can actually read and reason about the flow of our program. So if you compare all the solutions, this one, in my opinion, is so much better. You can easily read it. You read it the way that you say I'm in disonnected state when I get a connect event, I call establish and I'm going to connect him. That's really powerful to reason about and look at. So all those macros are worth it just in order to have that guy, in my opinion, because it gives us really easy way to reason about the application flow. Obviously, we have more macros and one another macro thing. What about the performance of that guys? Well, it's not a lot of code, 80 line. And the performance, you can see that there's quite a bit of assembly generated but it's not a bad assembly. It's not always about the size if it comes to assembly. You can see we have a jump table generated compile time so all the transitions are just jumps which is not but if it comes to performance and we'll take a look into that as well. So just keep in mind that having a longer assembly is not always bad unless it's inline when we want the inline version. The summary, so it's declarative and expressive. As you notice, I hope, in the state machine transitions table, I love it. I think that's the way to go. There's dispatch which is O(1). Jump table during that compile time. It has UML features. Small memory footprint. There's a learning curve. It's DSL based. Not everyone likes DSLs. Is macro-based. That's terrible, we don't want that. It compiles very slowly so if you run it on Godbolt, you'll get a timeout. So in order to get the results of Godbolt to run it locally but you can still click it. I don't know, maybe Matt will extend the time, I don't know. It's very slow to compile. And the error messages, well, you don't wanna look at them. it's NPO 98. So yeah. So, that's the reason. That was the main reason why I actually implemented the a slightly different version of MSM using modern C++ features. At first, I started with extending MSM but that was just quite painful because it's C++98. And in SML, we can do that. I think that's even more powerful than the MSM because, well, we don't have macros. Yes, we don't have gotos, that's eve better. Everything is easy to follow. It's C++14. We have the transition table, really easy to read and follow. Everyone agrees? Do you find that easy to follow? Easy to follow than if else nested switch? Yeah, I see the thumbs up, I like it. So yeah, that's great. But, you know, we don't want it just for the sake of being able to write expressive way of code in a C++ because C++ is all about don't pay for what you don't use, right? We care about the performance. We want to have the best performance possible. We want to be able to pick how we want to optimize that guy. But the interface is pretty neat so let's think of the interface and let's give us the capability to switch how to this, how to change the strategy of how we actually dispatch events. So SML has a different policies at compile time. So we don't play at runtime for anything in SML but at compile time, we can choose which version of the dispatching strategy we want. We can have a jump table, that will be generated compile time jump table like with MSM. We can have a nested switch. We'll take a look into those guys. If else, fold expressions. We don't have coroutines yet so we can use coroutines at the back end. Well, that's as an option as well if you really want. So I think that's very actually neat idea to be a to switch the backend policy and have really nice common interface because if it comes to performance, you always have to measure, right? So it's not always that the jump table will be the best and from my experience, it's not always the best. So let's take a look into the dispatching policies. If, else. So assuming that we have some mappings and states and we have just a dispatch call which takes the current state and the event, so that'll be the backend interface for dispatching the policy. What we can do with that? Well, we can actually call it recursively for as long as we find whether we are in the current state. If we are in the current state, we will call the execute which is just the transition. If we not, we'll call ourselves again with next iterations and the next state so that we'll be able to call it at the end possibly. If we run out of states, well, we don't call any transition and probably it's just unexpected. And that would be very well optimized in, This is in Clang. It gives us exactly the same code in the MSM as before so that's fantastic. Switch, we have the same case and we do yet another trick of the switch, nested switch. So in default, we call a dispatch again which will call with the next ID which we'll call the switch again so we have this nested switch which is generated that compiled time for us and if you have the case in which we care about, we call the execute. So is that optimized? Well, yes it is. So, it's optimizing Clang and GCC and it gives us the same result as the switch did. Notice that on the left side, I highlighted how to switch the policy. By default we don't have to pass it through and the policy will be the best one we use for the compiler in a way that compiler could actually optimize that guy. We go to jump table. Jump table is not, not straightforward but it's not difficult either. We have the states, we have the dispatch table kind of type for their function pointers and we just generate an RAII for all the events and after that, we just jump according to the current state with the event. So, that will give us more code, less than MSM actually but it won't be fully optimized neither in Clang nor in GCC but from the performance perspective, sometimes, it's much better than the inline versions. It depends. We have to measure those guys. And yet another solution is the fault expressions which is like addition for C++17 and what we can do here, we can just fall through all the available events with the available IDs and we just execute when in matches. It's pretty much the same as recursive version, just not recursive. So that's pretty handy and full expressions are actually very well optimized as well. So, it gives us the same output as if else or a switch enum. So, yeah, that's handy. Nice to have. So the summary. The main benefit of using libraries like SML is to be being able to build an expressive. We have this UML transitions been visible quite well for us, the declarative flow. Being declarative is always better than not. We can customize it at compile time. Awesome, that what all C++ is about, right? Customizes compile time. It's inlined. This part is the one, fantastic. Has fast compilation times. You'll see that in the results in the benchmarks in a sec. It has additional features which we don't talk about but it's obviously libraries so it's much more powerful than just the transition. We can't compare that to the variant which is just a way of using for the transitions but doesn't have anything else. There's a minimal memory footprint. So the size of connection is just one byte. There's a learning curve. Obviously, it's a library and it's DSL-based which is a good and bad. Depends on how you look at it. So, if we sum up the solutions, we can quickly get through them as depending on how the state is represented, how the transition table is represented and how the transition is represented. So, state might be represented by Boolean variable in if else, enum, maybe a class, maybe an union, maybe a function or maybe even a type like in SML. And transition table might be per state which means that object-oriented design using per state or you can have it global where you see it all of them. And the transition is implicit or kind of explicit. Implicit means that you have to look through it somewhere in the code to find where you actually do the change state and explicit which is usually better than implicit means that we can actually take a look into one simple source code, one file, a few lines of code and we see everything is happening. There's no surprises afterwards. Okay, so, sorry. Let's take a look into benchmarks then. How does it compare? Yeah, I was talking about for almost 50 minutes about those solutions, we compared the Godbolt results but we have to measure to be sure which one is the best if it comes to the measurements we'll going to take. So in order to do that, we change our actions and guards to have the, to verify, to access a memory so that it won't be totally inlined and the idea is that we randomized the events and we run the process event on those guys with GCC and Clang. The result would be for Clang but with GCC, it's the same. Basically, besides the variant and we have to use Clang because we have codes in Clang, we don't have them anywhere else for now. So lines of code. The less, the better. SML is really good here because it's like optimized for being expressive. Design pattern, state pattern is not the best one. We have to write a lot of code as usual with the object oriented design, it's a boilerplate. MSM, although, it's really expressive, it has a lot of boilerplate code to write with those markers. But besides that, it's quite even. Assembly lines. So, that's what's we compared before in a Godbolt. We see that there's like naive switch and if else and SML since it's using switch or if underneath in those cases, a very well optimized an inlined, Statechart. If you care about performance, don't use a guy. It's not the best one. Coroutines and variant. Yeah, they're not terrible. Not ideal either. It depends obviously on your case. If you care about the performance, probably, you want to use them but if you're not that bothered, then you can. So that's the main kind of benchmark which shows the run-time performance. As I said have been longer assembly doesn't mean that it will be faster or slower. So you can see MSM had really long assembly lines because it generated jump table compile time but actually it's quite fast. Variant is quite fast as well on Clang, not necessarily in GCC. Statechart is slow but we didn't expect it to be fast since it generated so much code. SML, switch, if else, they're very well optimized because they are really easy to follow for the compiler to optimize. State pattern, well, if you care about performance, you don't really want to use virtual dispatch, right? Although, it might be devirtualized sometimes, most likely, it won't, you will have the heap and shared pointers and all that badness has to deal with. So, you don't wanna do that if you care about the performance. Coroutines, coroutines are quite decent. Probably, they might be better and it's just first stages of them being introduced so maybe at some point, we can actually improve them. Instructions per cycle. Well, the more instructions we can handle per cycle, the better. It just shows how good we actually utilize the CPU. If you have less instructions, we may not use, we might have not as good results here but it shows that all the solutions are more or less the same. Coroutines, a bit smaller than the other ones but having a value around two is really good to have or keep in mind. Branches. So, it's interesting how many branches actually MSM has because of the jump table. Besides that, inline versions have more branches as well in comparison to all the others but it doesn't mean that will be slower because it's all about how many misses you got. And as you see with the state pattern and we throw dispatch, well, we don't really hit it very well. Compilers are not really, CPUs are not really good in predicting where we are going. In other cases, it's much better. Statechart, since it's using the virtual dispatch as well, it does have quite a few misses but it still not as bad as the state pattern. Compilation time, that's what I was referring to earlier. MSM compiles quite slowly in comparison to any other solution. So that's not good. If you really have to be productive, you can be productive with MSM by having declarative way of using it but you pay for the compilation times a lot. Really, a lot. SML compiles quite fast. And the release, it's pretty much the same as MSM is very slow to compile. And the size, because we care about the size as well. Size is really important for us to, if you care about performance, you probably care about the size. We have some restrictions on that. MSM produces quite a huge binaries in the debug mode because of the debug symbols. As I told you before, it's all MPL based so we produce a lot of code in comparison to other solutions. But in executable, in the release mode, they're really small in comparison in MSM, in comparison to the debug mode. So that's really good which means that everything is just thrown away. So, yeah, we have like 10 seconds left so the mission is to embrace zero-cost state machine libraries and with that, we have actually zero seconds for questions so thank you. (attendees applauding)
Info
Channel: CppCon
Views: 17,598
Rating: undefined out of 5
Keywords: Kris Jusiak, CppCon 2018, Computer Science (Field), + C (Programming Language), Bash Films, conference video recording services, conference recording services, nationwide conference recording services, conference videography services, conference video recording, conference filming services, conference services, conference recording, conference live streaming, event videographers, capture presentation slides, record presentation slides, event video recording
Id: yZVby-PuXM0
Channel Id: undefined
Length: 60min 2sec (3602 seconds)
Published: Mon Oct 08 2018
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.