Modern C++: C++ Patterns to Make Embedded Programming More Productive - Steve Bush - CppCon 2022

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
welcome everybody to my talk on seat plus plus uh for embedded systems uh my name is Steve bush a little bit of uh background on today's talk I'm going to be going through kind of a pretty Breezy um uh set of topics on uh embedded uh firmware and C plus language um so we're going to be working on uh on breadth rather than depth uh so we're going to cut over a lot of things hopefully you will enjoy them and take some tidbits back to uh to your job or our Hobby and uh so like I say it's going to be a bit more breadth and depth uh and uh in keeping with uh bjarno's talk earlier this uh this week on C plus plus and constrained systems uh for embedded developers it's uh maybe less about uh going into the deep dark recesses of the language more than uh you know picking out the really good parts that uh you make us productive and uh without a lot of costs so let's go ahead and and get started so a little bit about me uh I'm Steve Bush I'm a research fellow at the Procter company uh maybe you've never heard of our company so much uh but PNG is a global manufacturer of consumer goods we make a lot of brands that you may have in your in your home in terms of uh cleaning and uh household care products so I'm an embedded product developer I work in a Upstream r d division of uh PNG where we work on connected and smart products so I do Hardware development and a little bit of everything mechanical too but one of the things I really am passionate about is C plus plus and making firmware development a bit more of an art form um as much as possible in our constrained systems so at PNG we have a number of brands that you may be familiar with probably some that you have in your home so we have a lot of cleaning products and uh baby care and um uh and oral care and grooming and so forth so you probably see Brands here that you recognize uh we are consumer products like consumable Products company but at the same time we do have a significant business in devices so I think our grooming businesses with Shavers um Oral Care with its toothbrushes and household care with some of their freshening and cleaning durable products as I said I work in Upstream division where we uh uh look at smart and connected devices these are the uh some of the devices that are near and dear to my heart we've shown them at the CES show formerly the Consumer Electronics Show uh the upper right is the area product which is a smart connected air freshener that doesn't just do one room in your home but uh but the uh with your whole house opte in the lower left is a makeup dispenser that uses a smart system integrated with the camera to basically do Photoshop uh for makeup on your pla on your face right so Photoshop for your face uh one that's really near dear to my heart is uh the Oral B toothbrush that's the io10 that's getting ready to come out uh it's a really fabulous product it has uh onboard artificial intelligence or machine learning uh algorithm that uh and interprets movements of the brush as positioned in your teeth gives you uh feedback on on how you're doing and brushing so like I say I'm really uh proud of that we got uh lots of pretty cool code running in that product and I'm going to use that as a as example kind of inspiration for the examples that I'll be uh I'll be talking about later in the talk so today's objective is really about making embedded code easier and more pleasant to write and maintain um so uh the things we'll be talking about are are not super sophisticated uh but hopefully they are things that you can take home and um uh make your code easier to write and read and understand and therefore make them objects of Envy from your for your embedded friends that use the C language the agenda we're going to cover like I say uh kind of a lot of topics uh so gpio configuration compile time lookup tables uh some an interesting way to look at numeric constructs data structures uh stream based IO um heat memory management and when it's appropriate and uh some Library features like studcano and Instagram uh so I got a lot to cover so uh honestly I'm not going to take uh questions along the way but we'll see if we have a little bit of time at the end and my commitment to you is that I'll be on the um on the Discord for those that are watching this live uh I'll be on the Discord to answer questions to you know to uh make sure you get your questions answered so at the end of the week there will be a companion repository up uh in my personal GitHub so SG Bush um so you can kind of refer back to the slides and come back to this link by the end of the week I'll have some example code uh set up it'll be very short Snippets um nothing new uh Grand examples or anything just very short Snippets so hopefully get the point across let's dive right into it um first is a configuring gpio declaratively so um gpio or the i o pins on our embedded product or microcontroller uh typically that's sort of the you know the base thing that we need to do first in configuring our system and this is kind of the old way the uh the way we see it in example code or code generators uh where you gen you have a a knit structure and then you start filling it in with things like uh what's the PIN number what's the uh the mode of the pin is it push pull or open drain you know electric modes speed and does it have an alternate function you know so for an alternate function maybe the pin is connected to another peripheral in the uh uh in the chip like the uh like the uart or a server press so what we do is uh having filled in the uh the blanks on that we sort of procedurally uh configure that I O point and uh oh by the way don't forget to do a write you know so that you write the correct logic level uh to that pin so you don't want to accidentally turn on a motor or indicate a light when you don't mean to during configuration time and remember that a lot of these uh uh are plain old constants so they're not strongly typed um and so there's there's a potential for error there and the uh the syntax highlighting doesn't help you out much right and since this is procedural and we have a lot of i o then we pretty much go through the uh uh the the same procedure again kind of rinse and repeat uh set some more Fields do configure do write and so forth so here's a better way of doing it at least the way we do it in on some of our code bases so we create a what I call a gpio def structure which is again a structure and this encapsulates everything we know about our gpio uh peripheral so the first is the actual Hardware Port itself and then the pin number on that Port then we have a bunch of numes that describe the function that we want for out of that pin whether it's input or output maybe an alternate function that's connected to another area in the chip and then other uh parameters that say what's the speed and then finally at the end uh what is the initial state that we want that iopian to end up in so what we can do is use a stood array and of course this is this can be declared uh statically with no Dynamic allocation right so all we do is have a stood array of gpio defs however maybe I would definitely want and then we sort of declaratively uh show what we want our gpio pins to be right so we have our Port B pin zero we describe its input output function and so forth and uh we use enums uh to give us a little more typing control a little more strictness and typing and then we can do multiple of these all in one go so basically we can we declare all of our i o and its functions all in one spot so just worried about this this approach we use weakenoms and that's just a design Choice that's that's not required you can choose to go another way uh we use weaken nums because they convert implicitly to their underlying type so if you're smart about studying what the enumes uh values are numeric values then they can be helpful in setting the registers um when you're a configure the i o uh but they're constrained to tight scope within that gpio Dev structure so using them uh in an inappropriate setting is is maybe a little more difficult to do we find that the syntax is pretty expressive and easy to maintain and it's pretty easy to consolidate the entire Project's i o configuration all in one place and indeed that's that's the approach that we want to take and I'll describe later so configuring and using i o definitions as I said we prefer to kind of configure all in one go we have our iOS declared declaratively right and um uh so we all want to do that all in one go and not preferred is uh we want to pass around those i o deaths to delegates and have the delegates figure out what they need to do with the i o so a bit more detail on that coming up so in doing this approach we have to write one function that is a bit tedious to write right so this is a configure function that takes a gpio def as an argument and in this function you would sit down with your 2000 page reference manual and go through the registers and kind of convert all those enums and so forth to uh to register settings uh so this is a bit tedious to write but the good news is that you only really have to write this function once and it's probably good for the entire family of micro controllers that you're that you're hoping to use have you written this function then you can write another function which is much easier to write so this is uh just a configuration uh configure function that takes a set of iterators began an end so this is sort of the the classic STL uh iterators way and so you just iterate through uh that stood array that uh iterable container and configure the i o pins one by one so kind of working away a little more to the to the modern end uh we can also write a configure function that is the the stood ranges where right so we have a template and we use my news best friend which is requires Clause that requires the uh the type that we pass to this function to be a stood range and of course uh a stood array has a iterable container qualifies so we we just uh iterate through those i o uh definitions configuring each one one by one so this is a really simple I mean literally this is this all you would need to write in a minimal case so I mentioned what do we do about passing gpio references around um so as an example I'm going to use this uh SPI bus SPI as serial peripheral interface it's a Serial bus that's frequently used in embedded systems and the model we're going to use is that we're going to obstruct for the hardware um uh Hardware boss right so this represents the uh the bus peripheral in the chip and then addition to the hardware we have a protocol so the SPI has a mode that describes its clock polarity and data polarity and so forth as well as a transmit speed and then um we combine those into an SPI connection so our SPI connection has a hardware bus a protocol and then the other thing about SPI is it's a boss with multiple uh slaves or chips attached to it and uh the way we communicate to those slaves is we assert a chip select the thing is we could uh just pass this gpio def uh along to the delegates to the downstream code and the downstream code we have to know that um in order to select a chip we actually have to set the uh the pin as a logic low um so in this case we might think that that has a little bit too much responsibility for a downstream code so maybe we'll take a different approach on how we want to pass it to pass down a uh a chip select the approach we're going to take is we're going to write uh just a tiny hierarchy of classes first as an assert type and then a certain type logic High which just has a value inside it that says true that is uh when uh when asserted the logic level is high and then the opposite type when asserted the logic level is low so the important thing is it just has these uh two little pollutions and then what we can do is is write a functor a functor for those who don't know is basically it's just a class it uh has a little piece of data in it and the data that we have that we pass into it is a gpio def that a representation of an i o pin and this class can be called as as if it was a function and we call it with just a Boolean enable so uh if we enable or assert this i o pin this value when asserted is then used to determine kind of the correct action to take and here we're just setting the the registers in order to set the uh the i o pin to the correct logic level how might we use this what would this look like so again we come back to our SPI connection structure we have uh as a a member this gpio assert functor as an enable function and that gets passed to the SPI connection when we construct it and then our SPI connection has a read write function right so as you might think it has some output data and an input data and what's the length of the data we want to transmit so uh in doing this we simply call the enable function with the true means meaning we want to assert it and the type that we passed in uh determines what happens when we assert that that pin which means that it's actually asserted as logic flow once the writing and reading is finished then we de-assert the pin and the the right logic level happens right so let's kind of bring it all together and see what it looks like so we have a stood array of our gpio Das our i o pins and again this could be the i o definitions for our entire project we configure i o pins All in One Go just with one call then we currently construct our Hardware boss and then our protocol it runs on our Hardware bus and then we Define as our chip select this gpio assert function with a cert type logic low that means when we want the Chip's attention we're going to assert a logic low and pass to it the appropriate i o pin so finally we created our SPI connection we pass our bus and our protocol and our chip select function and this way we take responsibility for uh everything uh including the design design decisions around whether that what does asserting a chip select mean so wrapping this topic up uh centralize your i o definitions and do them declaratively and you can abstract i o functions and actually this Factory approach that I showed you really comes with little or no cost compared to some of the other options that you might have next topic creating lookup tables using compile time expressions so um lookup tables are really common feature of embedded systems so I'm showing an example here where we have an array temperature from thermistor a thermistor is just a non-linear voltage device or resistance device that relates a resistance to temperature and so this is a really common application where we might have a an array of temperatures these are just floats and each one of those temperatures corresponds to an ADC or analog to digital conversion value so for each ADC value 0 1 2 and so forth we have a temperature conversion a temperature value that corresponds to that that code so frequently we would we would generate this data via spreadsheet and then copy and paste it or we might do it via script and copy and paste or if we were pretty sophisticated we may generate a header file via script and incorporate it into our build system but all these options kind of have some complexity and potential for error in them so some considerations when we're talking about lookup tables and embedded systems uh constants or these lookup tables really have to be carefully crafted so they end up in the RO data section what this really means is that uh our lookup table is going to be placed in Flash and non-copy to Ram so in an embedded system flash is a is a limited resource but it's a lot less precious than Ram usually the round is a is a really precious resource uh in our embedded system we want to make sure that our tables don't end up in Ram somehow the consideration on the on the hardware is it's very common for hardware and firmware to evolve together so the example I have here is a resistor divider has a plane resistor and our thermistor temperature sensor uh that yields a voltage out right so um in the course of debugging our Hardware we might decide that this upper resistor here needs to be changed out to a more appropriate value and we might do that for example to move the temperature readings into a more linear range of the thermistor so in doing this we'd really like to incorporate our governing equations into the code and let the compiler generate our lookup tables right so uh these are things that we might want to might need right so the first is the analog conversion right so we have an ADC code analog digital code let's convert it to a voltage using this expression and then we take that voltage and convert it into a resistance using this expression then finally we take that resistance and then turn it into a temperature using the steinart heart model which is really kind of like a a Taylor series expansion and you can see that this is a pretty expensive function especially on an embedded system so this is definitely something we would want to kind of condense into a lookup table and what we can do is we can use const expert Expressions to both do the calculation to compile time and document our code right in the source code so going parallel to the previous slide we have a const expert function voltage from code and we just pass some parameters including the ADC code and we yield a voltage out similarly we have another expression uh resistance from from divider we take that voltage and then turn to it into a resistance and we have uh finally this kind of Taylor series expansion uh thermistor value so we pass this function uh an array of coefficients and the nice thing about doing this is this is templated on an integer value which is just the length of this coefficient so for a thermistor we might look in the data sheet and find that they give us two or three or four coefficients and we can play them appropriately because we just simply iterate over the coefficients and use powers and logs and so forth to calculate the temperature so and to reiterate you know this is a const expert function everything here is known at compile time uh so the compiler can do all this figuring for us only we have our thermistor table uh generation function so we pass our array of coefficients and a bunch of other parameters that are known in compile time we create a standard array of floats the floats are going to be temperature and we're going to create 256 values uh one for each uh possible analog to digital conversion value so again we just iterate over every possible ADC value calculating voltage from code resistance from divider and so forth we might we instantiate this what does this look like in our actual code so we have our coefficients here thermistor coefficients that we can take from the data sheet in this case there are four of them we use our function thermistor table to create our thermostat lookup table and again this is done at compile time and then later in our application that thermistor lookup table is available for us to do lookups or you know the cases you know State space relationships if we're trying to shortcut our control algorithm and so forth so whether I I emphasize earlier is we want our table lookup tables to end up in the constants or Flash area of our chip and so you don't have to take my word for it let's uh take uh the example that I showed and cross compile it and look at the look at the binary so if we look at the sections uh we go down and look at the RO data section we know for a fact the RO data section is going to end up in Flash and we need to take note that that's section 18. and then we can look at our simple table also an ol file so do a read elf and look at our symbols and we scroll down and we find our lookup table thermistor lookup and we find indeed it is in section 18. so this is definitely going to go in Flash um as a constant lookup table we also noticed that we had 256 values in our lookup table they're floats so it's at 10.44 bytes is in fact the correct length sum up lookup tables can be very fast and time saving or space saving and these are really often critical for embedded real-time applications uh the nice thing is we can place the design sport support in the source code there's no external processes like a build process that can or copy and paste it can lead to mistakes uh lowers the complexity in the build process right so when we make a change in the hardware we can just say change it into source code it gets pushed up to the uh uh to our our Version Control System uh so that is documented and of course tables reside in non-volatile memory so next topic uh code with uh numeric structures that humans can actually read and edit and this is a topic that's really near and dear to my heart uh for reasons I'll talk about here just a moment so just like a previous example a lot of embedded applications use long numeric structures and they're frequently known at compile time and we would prefer to store them in non-volatile memory some examples are Bluetooth we have Bluetooth Services we might have uh 128-bit uuids or a MAC address that's static same thing with the IP if your device is ethernet connected we might have ipv4 or IPv6 addresses that are static similar thing with USB obviously these are not like strictly numeric structures but USB is a little unusual in that it takes Unicode encoded string descriptors and so um all these are a bit difficult to deal with in uh in sort of the ordinary Sea World but we can use a uh features of the language in C plus to really make these easier to deal with and I'll show you how we do that in our code so address like structures the old way right so uh this is a the canonical uuid so this is the way we would see it in our documentation you know with braces and nicely um segmented out um but if we were just to kind of do it in the standard way um you know with a an array of bytes you know we would uh have it uh just as an array like this which I find kind of pretty difficult to read and understand and change so um the same thing is with the uh with conical Mac address right so we prefer to look at it uh in the canonical way uh but we tend to store it in a as a array of bytes like this same thing with uh ip4 addresses we might have an address an address and a mask bit uh structure and uh and sometimes that's a little bit hard to read and understand I think it's especially the case with uh IPv6 addresses so ip6 or addresses obvious are very long but the uh the the convention is that we have shortcuts to describe those addresses in a really short way uh so like we've shown here but if we kind of take the uh classical approach to numeric data you know as a as an array that we have to explicitly call out uh all the parts of the address like including all the zeros that we would uh typically kind of overlook uh similar with uh with uh USB uh strings uh USB string descriptors uh have to be encoded as uh as Unicode so I don't know about you but I've seen this lots and lots in generated code or example code where uh like a device descriptor or other string descriptor in USB is written like this with interspersed with zeros so it is uh is uh unicode um but again you have to remember oh yeah we're in a little Indian system so their e0s have to go in the right place and so forth or if you take a a bit more structure data structure approach to uh to your string you know you can have uh well what is the length of the string and then uh what is the standards based ID of the string and then we have the Unicode data uh or worse yet uh we can have a c type macro which converts uh our string letter by letter by letter um to uh you know to this Unicode uh format uh and I don't know but you but I find this like really really uh difficult and frustrating to work with and so there's really got to be a better way so this is approach that we take um so we can imagine a uuid class and we're just going to take the simplest possible approach uh to um to the data storage part right so it has a private data storage of a stood array that just stores the data in in bytes and it's pretty ordinary other than it has this Operator Operator quotes which is the user defined literal operator and we just defined this as both a friend of this class so that it has access to its private data structure and also const experts so we can construct uuid objects at compile time so let's look at this uh this user-defined literal operator in a little more detail so what we're doing here is we declare again contact Spurs so a valuable evaluatable a compile time uh operated quotes and with the underscore uid and again this is the user-defined literal operator and uh we're going to be passed uh within the quotes you know what is the text in the quotes and what is the what is its length so what we're doing here is we're just doing a very simple uh simple as possible hexadecimal parser right so we're just taking the character zero to nine and uh translating them into the corresponding data same thing with hexadecimal a to f and then capital A to capital f and then as we're parsing through we can check for bogus characters here if there's characters that we don't want to be uh in our our string literal we can reject them here or we can do as we're doing here which is we sort of implicitly uh pass over them with doing nothing and the nice thing about being a cost expert expression when we do this as a string literal uh we're allowed to throw exceptions and so if we have something that doesn't quite add up in our string literal our uuid literal for example that doesn't have 32 valid characters then we can throw an exception uh that that has a a meaningful error message and again this is going to be thrown at compile time where we can where we can catch it how would this uh work in uh in real life so uh using the sort of the simple structures that I showed you earlier you know we could uh uh declare our uuid as such and we just passed the uh canonical version uh in the form of this user-defined literal uh same thing with a Mac or an IP address and so forth right all of these can be then parsed into structures that are useful for us in our application including this uh USB stream uh literal so this can be uh parsed into as much sophistication as we want you know including the length um you know the USB standards codes and so forth one thing I want to point out is the the C plus standard does give us a kind of a Unicode literal which we could use uh one thing to watch out for oh is uh this Unicode literal will give us a Unicode null terminated literal which in fact is going to be one more character uh than we actually need for our USB string descriptor remember it's USB the string descriptors are passed along with its uh with its length so it's not not null terminated so this is a really uh nice uh feature that we use for our statically declared data it just makes uh looking error code understanding it and changing it really a lot more pleasant so next topic using stream based i o but skipping the library so uh everybody kind of recognizes this this is a simple hello world application the way we've gone about it though is we've used uh i o streams and uh we're just doing you know we included i o stream and we're simply doing hello world to uh to standard out so um what we can do is we can go and cross compile this and I've included all the details here the query details about cross compiling it just to be transparent of what I've done you know we've optimized for size and we've thrown in a few things that uh we'll sort of trim down the code size in the typical case so that we get a reasonable size binary and then we convert this uh into our binary so we can see okay how much space is this going to take up in our uh embedded system so what we can do is we can compare a bare main it is a literally a main application that is is empty uh with what we just compiled up here so um the 1.7 K is basically just kind of the interrupt Vector table and a few other things but simply adding IO streams and saying hello world has added like a lot like a really a lot of code to our uh to our application and and really all this stuff is going to have to be stored in Flash so kind of the point here is is there a way to do this but kind of avoid a lot of the overhead uh that we incur in using this sort of the typical i o streams and the approach we're going to take is we're going to create a lightweight file stream object so we're going to create a namespace MCU create this uh this class file stream and within this we're going to define a type so we're going to ask that our file stream object be able to Output a stream of uh of a formatted integers right we like to be able to specify that those energies be presented to us and uh in whatever Radix that we want right decimal hexadecimal binary and so forth and this will be super useful for us for debugging purposes as far as data uh this file stream object just has a couple little pieces of data it has a a Unix style file descriptor which is nothing more than just a just a number and it has this Radix settings so our file stream can remember what Radix we want our output format to be in and so we have uh just a very simple Constructor like this that just takes the simple file descriptor the numeric file descriptor and then we get to kind of the meat and potatoes of of our file stream class right so this is the stream insert operator and this is the overload that takes uh takes string pointers and what we do here is just take the length of that string and then we'd dedicate the writing to our syscall or write function and I'm going to talk a little bit more about this right function later we just passed the file descriptor that's part of this uh this data structure as well as the string and the length we can go and do this for other data types as well right so here we've constructed a stream insert operator for integer type type data and again I've templated this and use a requires clause and again this is kind of my new best friend because we can craft this overload to take a number of different data types and it's very descriptive about the the data types that uh that we're going to accept in here so uh really super new feature of the C plus 20 um standards release so again the job of this function will be to turn our integer into a string representation and then pass it to this right function and same thing with floating or whatever other data type that we would wanted to um uh to be able to have in our output stream so how might we instantiate this it's pretty simple again in our name our MCU namespace uh we decide maybe we're going to have two output streams uh one is going to be our debug output and perhaps this is connected to our Ur output that's connected to our board and then we have another debug output swo which stands for serial wire output serial wire output is really a just a facility of the debug probe so in the event that we don't have like a uart or another output means a test drawer board we can get the uh the debug output out from the uh from the debug probe that's attached to it so two different options here for getting output out from our our embedded system so I mentioned this write function really the the underscore write function is a is a sys call and I'd say uh it's basically the plumbing for our various output modes right so our underscore right syscall takes a numeric file descriptor it takes the data which is basically a string and it takes a length of that data and then we further delegate it to our various Hardware functions right so one of them might be our swo serial wire output right so this passes through our Depot probe in another file descriptor case uh we pass through our uart function uh and we can extend this to whatever debug functionality we want right so if we're ethernet connected we might have a UDP right that then goes out to a UDP listener out there that we use are using to debug our system Japan with this uh looking as we use in our application so instead of i o streams uh we include our new file stream utility and I remain instead of Hello World um using the the standard output stream we use our MCU swo to say hello zero wire debug or use our MCU debug stream for a uart debug so um you know so we can have various output streams use them for different purposes and uh and different phases of our uh of our debug use case the nice thing about this approach is it's actually pretty small so if we uh include you know all the hardware uh interface functions that you would need to make this work and and I have an example that that shows this um and we go ahead and compile it and then convert it to Binary uh here is our trivially empty main function in our 1.7 k and a fully functional example you know that we show above here along with writing to the hardware uh what we're calling main better is 2.3 K so just like about a half a k or so additional code which is a whole lot better than our typical i o streams 100 157k so we're able to get uh kind of lean mean stream based i o for really not much cost and uh we can add some bells and whistles to our uh stream function right so uh here we have a trouble Troublesome function that we want to uh to debug right so we might have some numeric values or string values that we want to dump so what we can do is use our MCU debug output stream and specify hey uh the next value that we want to have come through is we wanted to present it as hexadecimal and so we pass our our numeric value in and then we get a nicely formatted hex value so it's pretty nice uh and this is easily extensible um though you may be asking okay what's wrong with printf and kind of the the usual uh uh output formats and so forth and and why would we use this and instead of uh say that the new stood print that's coming in the new standard we find that it's really useful to extend to other types right so earlier I mentioned that one of our uh really common applications is machine learning inference right right on the edge and of course that is um uh part and parcel of that is creating tensors and operating on tensors and doing those functions so we have frequently have an opportunity you know we create tensors we need to look at them in the debug stream so what we would do is simply create a new uh uh stream insert operator uh for our file stream class uh pass it to tensor and then we would iterate over each of the dimensions in the tensor and at the base of that tensor is just simply elements and we would pass those into our stream functions and so here we have a a pretty simple way of expanding our i o functions to include how do we output tensors I'm going to show let me show you how that works so here we're declaring a tensor result it's going to be a 10 by 10 tensor and just say that we want to see that in the output stream so using the uh the the method I had in the previous slide you know we could um simply outside output something uh to the debug stream that looks like this so pretty easily extensible to all sorts of data types you know USB types or any sort of complexity of data types that you want really just by writing a a pretty easy to write function and so this is really for us uh the magic of using stream based IO it makes uh debugging so much easier and it's really extensible to all sorts of types for the next topic I'm going to grab the third rail of embedded systems with both hands and I'm going to say that we can dare to use the Heap of course that's going to come with some caveats and you probably only want to do this in in non-safety related applications but uh using Heap and embedded applications or or dynamic allocation is usually a No-No runtime Behavior can cause Heap exhaustion right so we might not plan for the corner cases where we would want to try to allocate more memory than we have available sometimes long run times can cause a fragmentation of the Heap and that's really problematic because we have enough memory we just don't have a contiguous piece of memory that's suitable to allocate and typically Heap errors have no graceful resolution right you you typically end up in some sort of Crash and that's really no good especially for safety related applications so as a result many applications stick to to static allocation right so that means that a lot of our dynamically allocated containers are off limits rights to vectors to map list double into queue and all those really nice things to use uh but I'm going to propose uh that we can imagine and allocate once uh scenario where Heap is actually reasonable and desirable to use so the example I'm going to use is goes back again to this machine learning inference that we um that we use in some of our products uh as part of that we construct a calculation graph that is a sort of a directed graph of calculation nodes right so we have a class calculation node and uh this node we Define a sort of a um a shortcut handle to pointers to other calculation nodes and really what this is is sort of a fancy linked list you know so of a calculation node with child nodes so we have a uh a iterable connector collection of child nodes and this calculation node can do kind of do one of two things right we can override this calculate function so it kind of does useful work for the calculation or this calculation node has child nodes it can iterate through those child nodes and ask those nodes to calculate them so in this way we build up a a directed graph of calculation that's sort of the the bread and butter of our machine learning inference and since uh we're doing machine learning and we would like to update our machine learning models uh maybe even the topology of that model would really like to be able to kind of dynamically allocate this this structure of nodes but the thing is it's only done at initialization time so once we configure and allocate everything that's it there's really no deallocation and this is deterministic in the sense that we know the data that we're passing to our application uh so we know that we're not going to overrun the uh the amount of memory that we have available so this is this case is a good opportunity to maybe use Arena allocators so arena allocator is useful for kind of monotonic allocate once applications that is you only allocate a memory but you never deallocate uh memory it's really fast it has very low overhead or really no overhead because we're simply dispensing with all the bookkeeping is required for allocation and deallocation and reallocation and so forth so preferably we have determinated memory usage which we said we do because our data is known to live within the memory constraints we have uh Arena allocators I'm going to talk a little bit about them but uh only on a very superficial level so I recommend you kind of take a look at John lakos's talk or Bob stiegel's talk where we go into Arena like Heritage in a lot more depth than I'm going to so kind of the the most basic most simplistic way of using uh Arena allocators uh is overriding the global new operator so what we do is Define our own new operator and what we're going to say is it's going to have an arena size and now we're going to statically allocate an arena a memory chunk of that size what we can do is calculate a pointer within that Arena that's offset by the number of allocated bytes then we advance our allocated bytes count uh by the size that we're trying to allocate and we return that pointer so basically all this is doing is taking that big chunk of memory and sequentially Doling out uh chunks of that in a very very sequential function sequential fashion overwriting uh the global new operator is not for the faint of heart right we have to really make sure that uh other parts of our application doesn't uh doesn't uh rely on having uh allocation and deallocation and uh you know there's some other pitfalls that may happen here so uh another option is to override new on a per class basis so in order to do this we're going to define a class called Arena allocator and just like our other function it had has an arena size and has a statically allocated Arena of that size and it just has an allocate function which does exactly the same thing we did before it just dulls out chunks of that arena for use by our by our code so we can rewrite our calculation node uh in a different way here so in calculation node we're going to override the new operator for that class and it's basically the same we just have a an instance of that Arena allocator and ask it to allocate us our whole chunk of memory and what's going to happen here is every time we knew up a calculation node it's going to use this new operator to Dole out dynamic memory and uh the thing we have to watch out for is within our calculation node class we do have a dynamically allocated container this stood Vector so what we can do is pass this stud Vector our Arena allocator type so that we can ensure that as we add elements to our stood Vector container that is we add child elements that those that that container is going to use our allocator to allocate memory as well so in this way our whole calculation tree can function to use that this this Arena allocation stream but not necessarily mess with the allocation system that's going on elsewhere of the application so let's try a death-defying uh demo here so let me uh walk you through just a little bit of code that we're going to use as a as a demo uh here we have a operator new and this is doing basically the same thing as I showed you on the other slide uh other than I have a little const expert uh switch here that says either we're going to use this Arena allocation method or we're going to use the sort of traditional Malik method so I have a use Arena as a false we're going to send it to false first if I can type then we're going to go over here and then we're actually going to build and run this on a development board that is attached to my computer my desktop right now so we're compiling we're actually flashing onto the board and we find the number of allocations is 341. let me show you what we just did here sorry my main function here this is the heart of what we just did we just have a a loop that allocates pointers of size T until we're exhausted so we've just allocated into account I can't allocate anymore and then we used our MCU debug facility to display our number of allocations and we see our number of allocations is 341. so we can do is uh go back uh and now use our Arena allocator just by setting the switch and we'll go back and build and run again and again we'll compile and Flash to our development board now we find the number of allocations is a lot more right which you would expect because uh using Arena allocators we've just kind of done away with all the bookkeeping right so uh as a result we're able to do three times the allocations in this trivial little example was we were able to do before so all right and kind of wind this uh topic up it's possible to do uh allocation uh while avoiding pitfalls or fragmentation uh you know it's important to do that at uh at uh initialization time and be thoughtful about what we're doing but we can avoid uh Heap overhead and really do some some pretty interesting pretty interesting stuff so but the moral of the story is be safe all right so the next function our next section comes with a really Grand title of uh Implement one function to unlock all of time functions so um we're going to show a little feature of the uh some code that you can write to unlock some features of the uh of the C plus plus standard Library specifically synchrono so if you ever look through uh stink Chrono the CPP reference in your free time and of course who here doesn't uh you'd find there's some really interesting stuff and stood Chrono you know some really nice uh timing and calendaring functions uh and in terms of embedded uh systems uh they're kind of two different flavors we have a relative uh time which might be a hardware timer or the hired word timer Counts from zero to some very high number and then wraps around again to zero again and then we have absolute time which might be our real-time clock so uh the performance the applications this may be relevant for our performance instrumentation and maybe time and day and calendaring so in our quick little example here we're going to kind of concentrate on performance instrumentation and what we're going to do is uh show you how to unlock uh stigrano so the first thing we're going to do is write a little function that uses that configures the hardware what we're doing here is have a microsecond clock configure and what it's going to do is configure a hardware timer to be a a monotonic timer counting microseconds so what we do is pass the hardware representation of our timer and a few configuration parameters and we're going to ask for a timer frequency of one megahertz which corresponds to our one microsecond tick what we're going to do is just set our uh our Hardware registers uh you know turn on the timer set some prescalers and so forth and so this timer is going to count from zero up to hexadecimal all F's and then wrap around and go again so we've configured our Hardware microsecond timer what we do is uh in the stud Chrono namespace we're going to Overlay uh our own version of this high resolution clock Now function uh which is just a simple function that returns I stood Quanto time point so the now a function just returns this time point and all we do here is we have our microsecond counter again this is just a value in microseconds and we convert that to nanoseconds so we take microseconds multiply by a thousand we get nanoseconds and then we return that as a Time point all right and let me show you what we can do with this that simple little bit of code so here I've got uh just a simple benchmarking application so this function here Benchmark just takes a stood function uh just tastes like a function of our own definition as a parameter and it takes a a little label and what it does is uses a stud Chrono uh to take to see you know to store a start time and then we execute our function to be timed and then we calculated the elapsed time again using our stud Chrono facilities right and then we output that to our MCU debug stream and so really really what I've written here is Google Benchmark all in about five lines of code right and what we're going to time is my expensive function here so this is just a kind of expensive function it has signs and cosines and exponentials and what we're going to do is um users to Chrono functions not only to do the timing of our function but to kind of uh do endless loop of delay and then evaluate this Benchmark function so Benchmark in microseconds my expensive function and output trig function we're going to do is go and uh compile and run this again this is going to run on our Dev board and this is going to Output our function so again we've kind of uh recreated Google Benchmark all in five lines of code using uh Chrono and all we had to do was write just a few lines of code our last little thing uh we're going to implement a little bit of random and get some free beer not beer Library code um so the scenario is this uh your pause comes to you and says hey uh we're going to build a flameless candle uh it's going to have led it has a variable intensity uh that has a really realistic candle function in order to do that you want to implement a uh a random number generator with this probability density function instead of the usual uh uniform density function so what that probability function really gives you is sort of a random walk that sort of uh moves with a a small uh variation and then every once in a while you get a big variation and so this is going to effectively simulate our flicker activity of our of our flameless candle and this is going to really really really realistic and fantastic and then before you say this is like a really contrived example nobody would ever do this um just remember I work for Consumer Products company and we actually sold this product for a while right this is a flameless candle that uses sort of much the same statistical approach that um that I uh I showed you in the in the slide so fun with random numbers right so we're going to write our kind of bimodal or trimodal depending on how you wanna uh call it uh we're gonna write a distribution function and we're going to make use of lots of uh functionality and the uh C plus standard library right it's including normal distribution for a first mode normal distribution for a second mode and we're going to use this uniform real distribution as basically a weighting function so we're going to use that that uniform function to kind of weight our probabilities between these two modes and what does that look like so um our bimodal distribution has a operator parenthesis which takes a random generator device uh it uh calculates a random number based on this uh uniform distribution and then based on the value of that number we either pick this first mode or we pick the second mode for our distribution and so that that just sort of sets the weighting of our two modes of our bimodal or trimodal uh and we use the Signum function to uh to make it symmetrical about the about the uh about the axis so what we can do is we can go on the PC and we can uh using um totally uh available facilities on your PC you know stood random device uh and then our we instantiate our bimodal distribution and then we can calculate a bunch of random values using this distribution using our PCS random number of facilities thing we could do is we can notice that our microcontroller since it's capable of uh of cryptography on the microcontroller we actually find that our um microcontroller has a random uh number uh Source a random data source so what we can do is in again in our namespace MCU we can create a random device uh configure that device and then give an operator parenthesis because there's random data and so we can do the same thing with very minimal changes uh create our random uh distribution like so and wrap up with one last demo uh here and I will uh show you the code really quickly uh so we are creating our bimodal or trimodal distribution however we want to see it uh we'll create a uh a histogram array and down here we're basically going to create a ASCII art uh histogram array and so what we're going to do is just uh build and run this application uh flash it on the board and here's our ASCII art um uh numeric distribution so this is actually generating random numbers in a a very unusual distribution using the uh random number of facilities in our in our microcontroller so uh with that I'm running a little short on time I'll say thank you and encourage you um if you have questions uh please put them in the uh in the Q a um and the moderator is going to help me answer a few of them and I'll make sure that if I don't catch you here I'm going to make sure that you quit your questions get answered in the Discord chat so be sure to stop by be happy to chat there
Info
Channel: CppCon
Views: 30,811
Rating: undefined out of 5
Keywords: C++ embedded programming, embedded programming cpp, embedded systems, embedded c++, embedded systems c++, embedded, arena allocator, microcontroller, c++ standard library, memory constrained cpp, c++ memory, Modern C++, C++ Patterns, Steve Bush, modern C++ idioms, embedded developer, embedded developer c++, c++ talk, cpp talk, c++, cpp, cppcon, c++con, cpp con, c++ con, programming c++, cppcon 2022, cppcon videos 2022, cpp embedded systems, C++ Standard Library, embedded dev
Id: 6pXhQ28FVlU
Channel Id: undefined
Length: 60min 13sec (3613 seconds)
Published: Thu Jan 05 2023
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.