Optimizing C for Microcontrollers - Best Practices - Khem Raj, Comcast RDK

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
all right thanks for coming my name is Kim Raj I work for Comcast on the RDK project which is the reference design kit it's a very innovative name which is basically our setup box operating system which is based on Linux part of my work involves working with embedded Linux but then it's also extending to other smaller devices like security cameras and sensors and stuff that goes in your home for security or otherwise and so today what we are going to talk here is about some of the C language constructs that we will read that are specific for microcontroller programming and what I'm going to cover is you know these few items so just know your tools very important because there are several tool chains different compilers other tools that we use and they all behave differently so it's very important to know them data types and sizes you know more on embedded systems generally when you have general-purpose chips you know the the length the process of word lengths are known but in microcontroller there are 8-bit 16-bit 32-bit you need to know which one it is and then how variables and function types can help and then what can you do in loops and then assembly we talked about assembly it's contentious many times it depends what you're doing which compiler you are using and then some thoughts on Ram optimizations and then in summary what you can do keep in mind when you're programming for microcontrollers so it's a open session feel free you know I to present something that's wrong or you feel that can be done better I'll be very happy to discuss it further so feel free to jump in anytime you know with your experiences I love to here what I've done is I've taken Zephyr as a sample here and I'm going to cover the GNU Compiler mainly so you know there are other vendors and other compilers which might offer more options I'm not going to cover that here as much so it's primarily around these two projects so knowing your tool chains as I was saying many vendors there is a glue compiler and then you know I our system mom there are many and each compiler over there they have either added new features that are not in standard or they have certain additional options that are very fitting for you know the target microcontroller they are targeting and they may so you have to know what tools you have at hand and it's very important that you go through the whole documentation how they do prevent file pointers how they represent near pointers you know they may differ so if you are in for writing more portable code it's very important that you can either find alternatives to those or you can use them minutes in such a way that you can disable those features or have them as no-ops when you are really not using you know those compilers or those tools compiler switches so what I've seen here is I've given you a simple table what I did is I took the first hello world application and I compiled it with different of nation levels and OS definitely you know meant for size optimization so it it basically so keep in mind that this is GCC in in purview other compilers might have different you know naming conventions for these - all options so as you can see with code performance if you are really looking at help from GCC to optimize your code Mac series o - and O 3 or - or fast so what happens is as you keep increasing your o level you know it starts getting a little bit more inaccurate as well so you have to know when you change an optimization level whether your algorithm can sustain it so most of the time compilers you know keep in mind our tools written by other people so I'm myself I'm a compiler developer so those are applications - and you know they need help when you feed them with write kind of stuff they give you good results so now most of the time I will cover here in this talk that what can you help compiler with to get maximum out of the compiler so as you can see the code size goes up as you increase your optimization levels but if you have a real world application you would see the execution will be faster as well in some cases in some cases actually when you do optimization for size it improves your human code performance it depends upon your bus width and how you can compare how you can utilize the memory bus for example if you have like 32-bit memory bits and you have 16-bit instructions set it can improve your instruction cache usage and there are loads that when you compile say you know in 16-bit may perform better compared to 32-bit so so execution time really depends upon those kind of constraints now og is relatively a new optimization that's added what it is do is the during development for example you can use this option to get good debug view so it doesn't give you just raw translation which means you have highly an optimized code but what it does is it applies optimizations which gives you a good view of debugging so it doesn't tamper your debug view because most of the times when you enable optimizations you start debugging and you know it flips around and you don't have the logic slow and when you are stepping through the code you know you lose the context very easily so it's a good way for if your developer you want to kind of try out some debugging help and want to get good debugging experience this is a good option to try so other item in your tools portfolio is your linker script very important where you place your code in in microphone dollars so you decide how where the code goes and you need to know like GCC also has a very elaborate linker scripting language and you can really define your flash memory out outline and how it is loaded with sections goes where and you can also define symbols and other items wherever you want in your code for help during execution so if you look into new linker manual they have a very elaborate syntax for linker scripting most of the time it's you know we take a linker script and then we kind of enhance it what I found is it's very interesting and Gordon to understand all the construct that goes in and how you define your segments what are your sections and what are your alignments and how you define your total length of your data sections and where you begin them where you end them how you want to initialize them all that goes into your yeah question yeah right so I think the question is is there any recipe actually a template for writing your linker script so I think generally when you have you know it really depends on your architecture how where you are storing your code where you are executing from right so many times use your story of code in flash and then you copy it over and you execute from Ram right in many times you do execution in place right and you are executing from flash and then sometimes you have SRAM sometimes you don't have s Ram right so so what you do is you basically define your data in your flash segments accordingly and and of course alignment is very important when where you align them where your addresses are so generally your read/write data right and an initialized data is what you consider for your Flash size and then how much stacks if you want stacks elsewhere or where you want to lay your runtime Ram data all those things basically depends upon your size there is no I will say that there is a recommended way you could do it but the effort of splits is that it gives you all the tools to define your memory Maps the way you want it so it's important that you read through all the key words what is available for you how can you define a memory overlay or you know how you can define other important aspects of your scripting that you will need during execution of your programs so linker map it's it's a good tool to see the output of how you link to your application many times you know we link the application and we want to see where the whole thing is lying you know which section went where and it kind of gives you the whole view of your memory how it is all laid out very important for a small application where you are either doing some optimizations or you are looking at okay is there any dead code that got linked in or is there any code that is placed strongly for example your inner sections might be required at a certain address is it there right or you know simple things like which function is is kind of adding a lot of code into my application you're looking at optimizing for size or maybe other reasons so it's a very good [Music] you know to at that point when you create Maps during your link step they can give you a lot of insights into how you build your application and there are also post processing tools at least they work with new linker which can give you more what I've given is a dump of what linker will put at you but you should want it in a more human readable fashion there are tools which can take it and give you more you know view of this is how much memory it is using or how much where things are and so you can do all this kind of visualization there are tools like that available as well what it also giving you is what it ignored right sometimes you spend a lot of time looking at okay certain function was just thrown away by the linker so it also gives you that the okay these these functions or symbols I found are unused so I decided to throw them away all right so you get that information there very good debugging health many times you debug through to find that a certain piece of code was thrown away so a very good tool I guess you know to to really visualize how the whole application in the end is laid out and then be noodles also often offer a few tools which are very useful so ABS dump one of the features I use very often is minus D with s what it gives me is it interleaves source with assembly so it takes your final L file and then interlace your assembly code with your source code so I get a good view of what code was generated for this particular line of my C code so many times when you're looking out for various optimizations or maybe wrong code generation helps this this is very useful at that point when you can associate your assembly code to your C code and there is a size utility it gives you a size that particular application is going to take in terms of text which is your code and data and initialize uninitialized data so this is a good way like you can keep a tap on when you're adding new code you are not bloating your code so you know you can put like a watch on this value and maybe you can add it to your build system at the very end to dump the size and and then see how much size is being added when you are writing code so very important actually to regularly manage the size of your code and a few tools helps you do actually it's actually giving you some dump of your application what it shows is you can see the program header so you know it also gives you similar information like what size does but it's much more refined where it tells you you know what addresses is allocated what physical address it is located at and flags what kind of flags are allocated to it so so these are few tools that gives you some more insights into when you are developing these are part of the GNU Compiler collection as well I'm pretty sure that the tools that we'll get from other vendors also have similar tools available so you can look into equivalent to those tools or maybe there are better ones so now moving into like what are kind of things you need to keep in mind so variables you know size is important usually it's very important that you know the word that the processor word length which is if it is a 16-bit processor your integer is 16-bit if it is a 32-bit processor your integer is 32-bit so usually it's a very good way to go when you use the natural world and that is your microcontrollers processor is reporting we will have few examples where we will so we will show that this is how this can kind of cause inefficiencies and Global's generally Global's they have value where you know you can access their state but they also have a cost the cost being that compiler cannot assume that they their state is available all the time so it has to always load and store them which incurs extra load stores which can get you very inefficient code so it's very important that you look at whether your function can deal with local data or you can achieve what you want to do in or other than using local data so here is a little example as you can see it's about the length that I was talking about so this example is actually from a Kotex m3 code generation and what you can see here is the new in the function I'm using a I'm passing integers to it and then similar function I am passing short integers to it so what you can see is that that's the code that's generated underneath so it's also optimizing the code for size so it's not a raw code so all the relevant optimizations are applied already but you can see that there is an extra instruction that it is generating which is basically doing the sign extension so it has to do the sign extension because what it sees is that we are using a short integer on to do the rithmatic strand here to ensure that you know the carry bits are calculated properly so that's why it's doing the sign extension so you can avoid you can avoid that if you know what data you are bringing in into the function ready yeah yes yes I'm going to talk about those I have a little slide on that good point the question was there a fast version of these integers and we'll cover that good point there you go so slow and fast integers did you have my slides by any chance okay so there are these extensions that are in new C standards called fast and least integer types so this is more portable way of writing non word length integers or data types if you want to so you have a choice there where you want to if your algorithm needs fast access or you can leave the slow axis so it provides you this extension with C eleven standard where it say you int and then you can you can have a you int eight underscore t you the least one would be you int underscore least eighty and theft eighty so this is again what I was mentioning earlier tell the compiler about your data tell the compiler about your execution your program and better he will do work for you yeah it is $3.99 yeah so it is C 99 I think I I'll correct it before I upload so very good the way to optimize your use of integers you know look at them they're pretty pretty useful because many times you know you can afford one or another and these things come very handy for and they are in C standard so you know all compilers who claim to be seen item and compliant will have to implement it and [Music] sometimes lead sees are nosy so they provide their own understanding of these defines so if you are using something is let's see with your artists or and you know look into that they may be overriding what compiler is providing I at least saw that happening would suffer and so I just wanted to share my experience that I was struggling hard to see why you know the each types are coming out from and what I saw was you know the includes and all how they were lined up they were overriding what compiler was providing as is a standard int so look at that compiler might feel claimed to be C 99 but you might have a lip see with your program that is overriding that again portable data types so many compilers I know they provide extensions okay here is a way you can represent the data and over period of time C has included you know many of the data types that made sense into the standards so c99 for example has you int 8 16 32 64 t they are very portable representations so utilize them as much as you can just don't define your own because they will keep you compliant in the past you might have done it I know in microcontroller programming you have your own kind of a hosted file that you include in all your source code and then you know you define them based upon which compiler it is and all those things you don't need to do those if you just follow the standard and expect the compiler to provide all those defines for you yeah I think it is standard int is what you will include but they had err underneath that defines them is in touch so this is a sub inclusion that I think I dug too deep into it where it is actually coming from so I just wanted to give you like okay if you see this file in GCC that's where these definitions are but if you want to include it in a programming in a program you should include SPD in that's a very good point okay so that's about portability of your data types so the cons qualifier you know we'll have few examples but it's very interesting what you will see is it's again qualifying your variables and your functions you are providing an additional information to compiler on which compiler can act so when you say Const you are telling compiler that okay this data is not modified so this can act as a hint to the compiler where it can apply more aggressive optimizations and it can do a lot better job of assuming what your variable that you're passing as function is supposed to do in the quali so he in in cogeneration compiler is very pessimist so he has to generate code for all cases so it will always if there is one chance to go wrong he will not use that unless it's very sure that it will always work most of the time but by doing this you are giving him a more play in more room to play so if you use constant variables you can also let the compiler regenerate them for example if they are constant they are already predefined variables for example then it can regenerate those constants during execution so it doesn't have to incur a load cost from you know memory and if it is stored in a slower media like flash then you know accessing it will cause a lot more slowness so use Const when you can so here is an example where you can see that you know in the it's the same function pretty much and all I've done is I've defined the Global's to be constantly in one case and you can see that compiler has regenerated them in their source code he's not doing any load stores from flash even though your constants are predefined but it still has to go and load them from flash and such code if I loop you know you can you can see how much impact it can have on your execution path so you can see in the first phase in the first example it is loading it from a memory address right and then adding it multiplying it sign extending it and then and I'm going back but if you look in the second example he is reconstructing it sign extending it and then returning so it's a lot faster code so const then the volatile variables do you think such a thing can happen can we have a constant volatile variable anybody yes no okay so can we have a constant variable yes so any example someone can think of yeah there you go very good so so the example is a hardware straight register for example so global variables as we were also talking earlier as you can see here we defined a external integer X and when you see the code it is generating in the function below every time he is loading it from the memory then storing it back and then again loading it storing it back so this is the impact of Global's that you will generally see throughout your code it doesn't matter which architecture you are on these are general problems that you will see so this is just the illustrating a global versus local usage it's the same function as you can see in the first example it's global but there are these three loads and stores where it is before calling the print function but if you see on the local where you are basically transferring that into a local it just knows that you know it's a local variable I can just I don't have to worry about a changing state out of the function so it can just load it into a register pass on to call the function so keep in mind when you're designing your routines can you live with locals in many cases it also helps is when you are operating in a loop or so you really need a global you know you can you can maybe a transfer that local value you know if it is not modified elsewhere into a local and then operate on it and and store it back in the end essentially yourself and then while you are helping the compiler again that you know an optimization that you are providing and not the compiler static variables so static variables important what I see from static variables is again you are making a statement about the scope of the function so it's only available for that particular compilation unit what it can do is I've mentioned here spatial locality that's very important what happens is when you're linking a program linker knows these all variables are coming from same module so he puts them one after another or at least he knows the map so what happens is when they are placed one after another and most probably you're accessing them one after another two or maybe you are accessing them together in a way because you are in a the same function or something like that it can basically add in generate code where he uses a base function or a base address and then use an offset to address different static variables it can perform that optimizations if you are using static variables but if they are global then there is no way you know for it it has to assume that it can be anywhere in the memory so linker cannot perform these optimizations static functions so I know many times we use macros and static function you know it's a it's a debate many times people have it but one of the advantages that you get from static functions is you let compiler decide when to inline it many times compiler knows more than us you know many times it can do a better job of inlining than ourselves so we should give it a chance to do the inlining and then optimize rather than we deciding okay I know what to in mind what not to in line because it knows the instruction length it knows how many cycles it takes it knows all the delays so it can do a lot of calculations on total execution of functions that probably you know it will take us time if we are doing all those calculations ourselves so my recommendation is always give compiler a chance and then if it fails then you kick in and then you basically help the compiler another thing it gives you is debugging so when you know you are not writing macros but writing functions in inlining or sorry static functions then you can debug them better compilers have done even I think GCC also can do macro debugging if you enable like the extreme level of debugging but then you end up with a lot of bigger debugging data that you need to deal with in your debugger so this is a pretty lightweight on you don't have to like enable those extensive door free debugging information to get all your macro optimizations of macros so the other thing is that you know during compilation where your static function is going to be laid out compiler knows it already so unless you are using like whole program optimizations and stuff like that the location can be pinned so if you are calling and it's not in lined he can still optimize your jumps it can use a static small jump so it doesn't have to do a veneer or interact jump and stuff like that so even it helps in you know creating a better calling sequence so volatile volatile you are telling the compiler that hey you know please don't do anything to this variable I know this is special and you want this to define so what happens is the compiler doesn't unnecessarily optimize on your variable so what happens is many times compilers have these optimizations a a real-world example I can give you is you have a register you want to access as it's 8-bit register or so you have defined it as a chart or a u entity and there are four of them in line so you have an access to them a read access to all four of them a compiler can basically decompose that into a integer or access wise we won't use LDR B which means load byte or store byte he will just say he will coalesce all the four because he sees they are one after another in address space I can just make a single four byte load right but that's wrong because these are registers you want to access them one after another so it's very important that you qualify that kind of data with you know volatile where you are telling compiler to stay away from from optimizing that in any mess so there are certain compilers I know proprietary ones who have some extensions to qualify volatile variables place it here you know place it there you know they kind of give those hints but this they are all non-standard so if you are using such a compiler and you might be using that be sure that that is only effective when that particular compiler is used so that makes you more portable to different compiler tool chains or even architectures because different architects you may provide a different tool chain and you will be in a fix so this helps you to port their applications quickly if you remain portable across architectures and across tubes so erase subscript versus pointer axis so again moving to you know how you can kind of represent your data so here's another another example I tried which is essentially same code in one case you know what I'm doing is I'm using pointers to access the data and in other case I'm just using the array as such and what you can see is the understanding that compiler has is different even though the code looks same you can see that compiler is able to understand the pointer accesses are much kind of because if it is operating on a global data he's loading the pointer storing the pointer and if it is doing just array subscripts he is just he doesn't have to do that extra point of conversion and so watch out for use of these cases the whole idea is to explain that watch out for such use cases look at it assembly that the compiler is generating and in many cases you would see that results are dependent on how you have used it yeah correct yes so that's what you know because many times what you do is you use a for example in the for we use optimized for size I was using as my default I'm level then if you're using a different compiler you know it may apply that that conversion of optimizations in other case it may not so so look out for that if it is not doing either enable that switch explicitly if you need it or understand that okay it's not doing the a subscript to pointer access conversions yeah I think in this case what you can see is that I'm defining a pointer and it is not able to Elia is to kind of associate that pointer that I can optimize it away so right now that is just the illustration from the compiler point of views if I had more aggressive optimization enabled it probably would have identified that yes point rail is correct you can do that yes you can use the restrict qualifiers but you have basically have to understand that okay this is something if I want it to work the way you know I'm accessing it he is important to understand what compiler generate that happens yes yes so it won't you know you might have to either let the compiler know that don't worry about aliasing I know that you know it doesn't alias you know you won't be saying the same issue but it will still have a single load from the parameter list for example or transfer but you won't see this issue so loop increment versus document this is this is actually applicable everywhere but what you see is when you are testing or you are increasing the loop with your counter then you have to test it against your value when you're documenting you know it's waiting for it to be zero so architectures provide instructions to test against zero you know and all those so what compiler can do is then you are decrementing your loop it can take advantage of those obstructions and what you will see is in here it's doing a sub s right and then that will set a flag and the instruction below is doing a branch if not equal to zero so you are able to combine fudge together the check and the branch into one but in the case on a increment side you will see that he has to explicitly make a check against value 100 and then that will set the flag up and then you will do a branch so it can save you instruction and there is also a post versus pre-decrement so whether you should use like minus minus X or you should use X minus minus so you can see that the both algorithms are actually doing same things you know it's going to print the value 10 times and but the algorithm if you look at the code that's generated the pre increment you can see that it's running in a it's able to generate better code because what it is doing there is it's able to first apply the operation the incremental and then use that value throughout the loop but in other case he has to also he has to apply the operation after the value so function parameters very important it depends upon the ABI really so if you look at say R maybe I you know that's what I was using here Kotex entry so the document they documented extensively and all architectures do if they have a common avi across all tools then you know this they tell you how many registers can be used for parameter passing and how they are passed so read through those for whatever microcontroller you are using and it may have a VI arm has a very strong API and all to follow that nowadays if it needs more parameters or more registers to represent the parameters then it's going to use stack which is going to be expensive so see how you can write your function signatures that you can max you can utilize the you know the given for register parameter passing in a very efficient way so one of the things is that when you have functions alignment also matters so I'll just show a little bit here so what you can see is in honor arm are 0 to R 3 those are the 4 paragraph four registers you have but you see in the first function what I'm doing is I'm taking an int a long long and int right so and in the second function I'm doing an int int long long right so in effect you need four registers to pass them but the problem with the first example is that he will align them and they be the long long because it's 8 bytes long so he would be able to use two registers for that but then one of the registers will go empty because of the alignment so in the end what you will see is that you know he is spilling into the into the stack pointer your third parameter so so these things can help you a lot when you are you know using these parameters so in line assembly it can it's yeah that's likely yes okay so inline assembly is you know use it when you have to in many cases compiler may not be able to generate the instructions you want to use say you know you're you are accessing a particular coprocessor or something using inline assembly helps you to insert those into your normal c program and it helps GCC to take care of the data flow analysis so it will be able to you know take that code and assemble it into AC function so intrinsics is one good example and there are said special instructions if you have any so I've given a link here for GCC's inline assembly syntax it's quite cryptic and but it's well explained so you can read through it and see how you can use it efficiently so it has qualifiers that lets you define what is the different register of what are different inputs and outputs and what are the constraints on those and I've seen other compilers have them - their syntax is very so it it is actually one of the most it's most common case that you'll see when you're coding your program from one compiler to another that your inline assembly just doesn't work so I think optimizing for DRAM use smaller data types we talked about that but when you use smaller data types you're using less memory so it can help you if you're you know ionic on a ram contain constraint system use pattern use kind of compressed structures there are back trucks in I think all kind of compilers they support them so use that or reorganize your data structure so you don't have much padding in between if you can in any cases you can't because it's a network parameter or something or a IP header that you can't do much about it and I know about the local variables use them as much as you can and for example if you use L okay you know you would see in the code that it doesn't release that memory even though we think I'm using it locally it won't release it until you return from the function so be aware of those kind of details you are still using that memory allocated until the end of the function so if you want to release it you have to make calls for free in between merge constant come up today shion's you know in risk very important that it can reconstruct some constants as we saw one example where it was trying to do that and that helps quite a lot and then you can check your stack and heat usage and see if you are having an additional allocation there that you are not hitting a limit for example then you know you can limit how much stack of heat you decide to look at for your app so you can have them kind of more or less you can use less RAM for so how's the compiler out close this I've been mentioning that it doesn't have a magic crystal ball so you it operates on what you give it and it makes worst case assumptions as we talked about pointer aliasing is one good example that if it sees there is a chance for pointer to alias it will assume it will area so he's going to give you the worst case scenario and a bad code there and the global data if you use a lot of global data then it knows that it's not invertible so he has to every access we make he is going to load store and dowhile is better because you are decrementing then for loops one reason is that the termination check can be performed and you can use the compiler forward and rotations also to help the compiler function attribute variable attributes pragmas you can tell a lot about your code to the compiler to help them give you the best code and there are intrinsic functions you can use them for optimizing your code again intrinsic functions are compiler specific so watch out for those be mindful that you know a different compiler may have a different different calling convention or stuff like that so stay away from debug mode and release mode because you want to develop the code that you want to run in production period so if you want to be you know a consistence see whether you how much debugging you can afford and how much optimizations you can afford more period of time using the same code generation for debugging and production is the way to go fine details about your system architecture bus lengths memory types flash sizes and latencies and very important and profile your code before you optimize anything most of the times we jump on to a solution and it's wrong so use tools as much as we can that gives you a really good picture of what your app is doing and then utilize the tools don't fight them most of the time there is a reason why they are doing what they are doing so help them to help you if you have them they will help you back and the avoid assembly if you can write everything in C so that's pretty much I had any questions we are open right now so we are almost out of time so if there are a few questions yeah it is when you know that your data is not changing it's good to use it it's always good to use it because it's telling the compiler that yes this is constant data that I don't have to always reload from memory so if it can be reconstructed it will yeah there was there was another question in the I don't know yeah so yes so yes that's Odie yep so one more last question you had some yes yes yes yes I think it's always good in theory to present yourself in terms of aliasing come out clear so that is the best scenario I've seen where you don't really let compiler trigger and you tell them clearly that I'm not aliasing so thank you very much and pleasure [Applause]
Info
Channel: The Linux Foundation
Views: 37,538
Rating: undefined out of 5
Keywords: openiot summit, internet of things, linux foundation, embedded linux, linux, embedded linux conference
Id: GYAhbYnObLI
Channel Id: undefined
Length: 52min 38sec (3158 seconds)
Published: Tue Feb 28 2017
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.