CppCon 2016: Jason Turner “Practical Performance Practices"

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Captions

all right as this main slide says here the session only works if you guys interact with me otherwise I'm just going to be talking to myself and that's not as much fun so make sure you're close enough that I can interact with you actually ask them to dim the lights a little bit so I can see you guys so hopefully it'll work so I guess we're going to get started if my clicker is working so my name is Jason Turner all of the presentations that I have ever given are up on my github page so feel free to check those out may have heard about CV PCAST at the conference co-hosts and chai Script is a toy project of mine that I will be talking a lot about today not directly but the lessons that I learned while working on it and I really prefer interactive sessions please stop me please ask questions yell out from the audience whatever you need to do optimizing compilers are amazing so what would you guys expect this to compile to what do you think is the the the code would look like from the compiler here you returns one compiler optimizes it away that's pretty cool what do you expect this code to do return to no it doesn't just return to in fact it has to do all kinds of things with creating both of the strings creating compiling the code for actually creating their buffers and deleting them and everything else to the point that actually suspect that that the very first slide that I showed you is a special optimization that the GCC people put in somewhere to look for this because it really only works in a few select cases so optimizing compilers are amazing but trying to predict what they can Pilar is going to do is a bit of a risky game so as I said this comes from my experience with Thai script measuring Thai scripts performance is difficult I have a lot of templates in it and the nature of scripting means that the execution is spread over a bunch of similarly named functions and I'll show you what I mean this is a simple script I am initializing X to 0 and I am looping from 0 to 100 and basically I'm summing the values 0 to 100 in this loop so this gets parsed by choice script as a file node that has an equation that sets X to the constant 0 and then a for loop with an equation that initializes I to 0 and then less than comparison incrementing of I and then the block where I'm actually adding I into X and these are all implemented with polymorphism in C++ so basically I have a bunch of functions called eval internal that are being called so my profiling data looks like this I have eval internal 100% of the time calling a function called eval internal it's not helpful so as I worked on optimizing Chive script I came up with a bunch of rules for myself for how I kind of make better performing code by default so which is better for the normal case standard vector or standard list I'm saying the the question was defined normal if you which one you're going to default to unless you have a reason to choose another and in fact actually uh I'll just say I just disagree list should almost never be used but maybe we can talk about that more later so ok why why is vector better than list what this is a big room okay what was that cache locality someone said locality of reference same thing okay all right what was that less overhead and what way right so standard list we what do we have to do here we're creating a list of one what does the compiler need to generate what are we going to do at runtime we are going to create a list node we are going to set up its delete to be able to destroy the list node we are going to I don't know what that next deletes for we have unwind resume because we have to handle the case that perhaps when creating the list node an exception was thrown for out of memory or something so the exception handling in here so we have to allocate the node we have to handle a potential exception and we have to assign the value hook up the pointers as you're referring to for the from the head and then delete the node and other stuff that maybe we don't quite get what has to happen and vector to do the same thing allocate the internal buffer and set the first value to 1 that is what vector does allocates a buffer sets the value deletes the buffer so for better performing code don't do more work than you have to what about standard array anyone what's that one instruction not pretty much X ORS the return value and returns it completely optimizes away the array altogether because we're not using it that's something that was created on the stack all p OD data it just goes away so don't do more work than you have to for containers I say always prefer standard array first then standard vector and then only differ if you need some tool that are some container that gives you specific behavior and make sure you understand what your standard library has to do when you ask it to use one of these containers and I've I've never found a place in my own code where list was actually faster I had places where I thought list was faster and so I used it and then I profiled and now I don't have lists anywhere in my code anymore all right what is wrong with this code and not using ra íí- I would say that's that's not necessarily true as the S is created on the stack it does its thing manages its memory I should set the value with the constructor or so so so the problem here is that we're first constructing the string object and then we are reassigning it so if we use this rule of always constant at smile Extro by the way always Const then you get this con standard string s equals whatever and we are using assignment initialization it's equivalent to calling the constructor so we construct and initialize in one step that's 32 percent more efficient than the previous version well what do we do in a case like this where we have potentially a bunch of different switch statements that we want to take we have some sort of complex initialization we're working with use a lambda that's exactly what I wanted to hear so you can do this and you get the exact same savings that you get on the previous version as long as we're always applying a rule of always Const so with complex initialization we're going to use IIF e people don't say that in the c++ world I like to that's from JavaScript that's the immediately invoked function expression if we really want to we can call it something different and see what's plus like Ely or something the immediately invoked lambda expression I don't know so what's the problem we're looking at now with this code efficiency problem what's that right I'm default initializing my string and then assigning to it I found in practice that not enough people really take advantage of the initializer lists and hopefully that's not this group but we have the same issues as the previous examples we're talking about so we use our class initializers excuse me um object initializers and we've got this where we're using the move idiom thing we're assuming since we're going to take our own copy of it that we're going to do a copy and move same games is using the constant ish eliezer but now can anyone find any efficiency problems with this code yes be depending on my usage I could benefit from caching Val that's correct Val parses string on every call so let's write a caching function this looks good right so now when we call Val we check to see if it's calculated then we calculated if it hasn't been then we return it good what's that don't store the string we're not there yet all right well I'm not hearing this from this side of the room very much yeah yes I'm not I'm not actually setting this calculated flag oh we also have another problem that I forgot Cibolo score guidelines state that Const method should be thread-safe right that's one of our best practice kind of rules and we're not setting is calculated so we're going to fix both of those things by using mutable atomic values and now all the code is good right now we're thread-safe and we're not calculating things more than once not is the question was shouldn't I pass the constructor argument is a constant reference you can find much debate on this the world seems to agree at the moment that if you know you are taking a copy of it as I know I am then passing it by value and using standard move works because if something is passed to you on the stack then you will move it and you'll then you'll save the copy but if you do a Const reference and you have to make a copy of it yes right so based on my usage it might be more beneficial to not ever actually take the string so we've got the problem we're doing a branching and every single time that was called we have our tommix which are slow and really this is this is the result we wanted to get to we had no reason to store the string necessarily in this code and this is directly representative of code that I had in choice script where every time an integer was evaluated I was literally reap arcing the integer so at this version we've got no branching no Atomics I and a smaller runtime because some of the string code didn't have to be brought in and the context of chai Script this took me about two years to realize I was doing this on a recall and it resulted in a 10% performance improvement across my entire system so this is my next rule if it looks simpler it's almost always better so don't Const always initialize using I Fe can help you initialize and don't recalculate values that you should only calculate once all right what's wrong with this code I'm going to now stand on this side of the stage and make the cameraman work harder so that I can hear from these guys I give you a minute it's too small to read oh the red I'm sorry I don't know if I can do anything about that um let's see if I can see if I can break my whole presentation it's only zooming the what is this this doesn't do any good I'm sorry uh I can't do a whole lot about the font colors at the moment unfortunately but this is one reason why my title slide I said everyone needs to be close so can anyone read it well enough to have an idea what's wrong okay alright so what I have is I have a struct called base and it's true that red color is very bad I'm gonna apologize I have a struct called base it has a virtual destructor and then I have a structure called derived that derives from base and it so has a virtual destructor and I am doing this because presumably if I have you know virtual methods I have virtual do a thing in the base class then I am going to want to have a virtual destructor most compilers will give you a warning if you have virtual methods without a virtual destructor so that you know that you're cleaning up your objects correctly and that kind of thing so with that explanation does anyone have any input yes so part of the problem is that and derived I don't need the virtual destructor because the inherited destructor is going to be virtual yes the other problem is that by providing a virtual destructor at all move construction and move assignment have been implicitly disabled by the compiler by providing our own destructor we have told the compiler that we are doing something special with the lifetime of our object and therefore it will not create move assignment and move destruction for us and also as was already pointed out the virtual derived destructor is completely unnecessary so this is my next rule don't accidentally disable move operations or more succinctly use the rule of 0 so I have defaulted the copy constructor copy assignment operator move constructor and move assignment operator and that really is really bad colors up there hopefully it doesn't continue to be a problem so fixing this and one of my commonly used classes in chai Script resulted in 10% import improvement and I have Scott Myers specifically to thank for this because of effective modern C++ I was reading at one of the pre-release versions that I bought online from O'Reilly and I said oh no what have I done and I went through and fixed my code is right I'm sorry what you might not need to use the microphone yeah the question is why do we strive to make the base class copy able and movable why derived its class yeah to slice it I'm sorry I'm still not catching the question basically you just provide the DD of assignment and copy operations to the base class and then you drive it through it this pretty more think usage or a class okay so I if I understand why didn't I also provide these operations to the dried grass why it's not known opium it should be copyable those are all set to default yeah but why footprint multi crosses oh why did I remain it copyable if it's a polymorphic class because this is an example on a slide that I maybe didn't fully think out the the point really though is about disabling your move operations and and also about not providing the virtual destructor in the derived class or if it needed to be so maybe that point itself wasn't as clear as I could have made it but in the derived if I had kept that virtual destructor which is something that I have personally made the mistake of doing hundreds and hundreds of times because I think hey I'm in a virtual thing and I don't think about the fact that I'm inheriting my virtual destructor from my base class then you're disabling your move operations at multiple levels because now if I had a virtual destructor in the derived class then I would have again disabled the derived ability to move okay is this legible so the words you can't read our struct for and int so we have a problem we're copying s into our struct called s does that it can everyone see that at least well enough so we know it copies bad so we're going to use move right now we are moving our string into our struct and so we get this like kind of double layer of move removing our temporary s into the constructor for capital S poorly named structure and and everything's good right yes the string s is in Const that is true I'm not following my own rule there but that's not where I'm necessarily going with this point this so by using move this is 29% more efficient by the way we're talking the same kind of like 30% efficiency gains by not thank men I copy and we've got a 32 percent smaller binary in this case because less of string is instantiated but good but what's better than moving the value but wait I see actually a hand up you're supposed to yell yeah why not just do the string plus expression in the constructor so if we do it this way we have a not very sort string and by the way this is to get around short string optimizations if you seen anything about about that of the conference because that really will throw off your your numbers when you're trying to do this kind of thing so I get another 2% efficiency gain by not creating that extra string on the stack unfortunately is sometimes leads to less readable code but in my opinion it's much more maintainable than having move operations sprinkled everywhere moves are easy to use wrong and overuse and then to accidentally use the thing that you moved from when you didn't realize you had this is better usually why don't I use string literal because I wasn't aware of it when I made these slides I have since been made aware of it and I didn't um I didn't put the code I didn't update them was this questioned why is this way faster in this code the string s has a lifetime it must be created on the stack and then it must be destroyed even after it's been moved from there is still something left that its destructor needs to be called and it needs to be cleaned up to some extent which is pretty much Anala but it is something I have had comments before saying well what we really need is to add destructive move to C++ but we don't have that today so we're taking the the concept of don't declare variable until you need it which is something that C must plus is very good at and you can really you know tell C programmers from C++ programmers because they'd like to declare all the variables at the top of the function because they have to but we don't do that in C++ so declare variables late as possible or don't even declare it at all if you can get away with it okay it the word you can't read is Auto um so can anyone see any problems with with what we're doing here this is one of my favorite examples actually well you're raising your hand too much what when I'm passing it to a function I am I am incrementing a reference count or I am not I am right inferencing the reference count why why am i recommending the orphans count because I'm passing it by value is that what I hear so we have copies being made of shared pointer because I'm passing it by value so we fix it by Const reference so we're no longer passing by value except what was that you just said yes I am creating a shared pointer of type derived but passing it to a function that expects a shared pointer of type base so does anyone understand what's actually happening here just a minute if no one else raises their hand yeah yeah you do understand what's happening or you want to say or right that has to create a temporary of type shared pointer base calling a conversion constructor if you will from from shared pointer of derived wrong so we get here so don't pass shared pointers in general is a good rule by going this route and not passing shared pointers around we get a two and a half times faster than the last version I'm sorry what was that oh yes yes I forgot to fix that bug I knew it was there I am asking for a pointer and I'm passing that to a function that expects a reference I should have D referenced it sorry yes as okay it's two layers of errors so I could I could fix it one way or the other I could fix two lines of code or I could fix one line of code by taking the constant base star but I wouldn't want to do that because you don't want to pass around bare pointers if you got that anyway to do it so that should take a reference and two bugs on this slide at least does everyone know what standard land line does it flushes why am I using i/o streams at all is that what I heard let's assume that we have a good reason to be using Oh stream so standard line is is the equivalent of a new line and a flush standard end line expect that flush to cost you at least nine times overhead on your i/o if you don't need it depending on your platform Windows is much higher than that in my experience so this is my real-world anecdote that I like to show and there's no red on this slide so we had this function called write file that took a no stream so we would output all of the data that wanted to go to the output file to this o stream and then we had our get file as string so the the two things that I want you to focus on are the two bottom functions mostly so this is exposed to a to Ruby via swig and the users of our Ruby library we're saying hey if I call your write file function it's an order of a magnitude slower than if I call get file as string and write it out myself and Ruby and I said no no no that's impossible and then like two years later I realized why so unless you have a really good reason to do a flush avoid end line that's where we're going to go there so just use a new line and actually I saw this for some reason I had never really thought about it before but when I was reading some examples from strew stroke I noticed like it's the newline character is a single character may as well pass it as a single character instead of using double quotes and you know really I actually looked at the the difference in compiling and it really doesn't make that big of a difference but you know let's treat single characters with single characters so in summary for our don't do more work than we have to section here of hidden work calculate your values only once at initialize time obey or rule 0 if it looks simpler it's probably faster this almost always holds true avoid object copying avoid automatic conversions never pass smart pointers unless you have a really good reason to make your conversion operations explicit avoid standard inline so on the topic of shared pointer what what happens on this line of code what's the red thing third thing is an int I'm sorry I'm making a shared pointer of int and I'm creating with the value 1 creating a shared pointer int is bad but you might have a legitimate reason to there is a better way but it's a it just want to get on to the question of what does the computer actually have to do so well uh-oh oops yes yes right so it has to create the memory for the thing it has to create the memory for the the control block and it has to increment the reference count and then it has to do all those things in reverse order okay which is totally correct so we've got all of this assembly that's generated by the compiler for making a shared pointer instantiation what do we do now now we allocate a buffer and then we delete it right and okay let's see what can't you read in red you cannot read the assembly instructions but as I said yesterday with the rule of thumb with assembly is less is better than more so we are creating a buffer for our integer we're assigning the value and then we're do eating it everyone good on this it's pretty straightforward and this is how it compares to manual memory management oh man my clickers jumping around on me so make unique verses by hand they are exactly the same all right we're wrapping up our part 1 this is a two-part talk and I guess we're about on schedule so avoid shared pointer avoid standard in line always Const always initialize with meaningful values don't recalculate things that you know are mutable do we have any questions before we move on yes with make unique the operator knew could possibly throw why was there no exception handling I don't know it doesn't do it in either case so I know that make unique isn't adding or removing anything from it someone knows you might have to go to the megaphone I'm sorry I just wanted to ask another question out in at the beginning you talked about not doing virtual distractor yes and maybe in that case you should have used stood function instead of for driving from a base class it's sometimes faster yes I just wanted to illustrate the point of things that have mistakes that I personally made in my code of when you do have a reason to be using inheritance and you have a higher class hierarchy and you've got virtual function calls just to be careful to not accidentally throw away your move operations so the next part smaller code is faster code and this has got a lot of red on it I wonder does anyone know if chrome has like a keyboard shortcut for put me in high contrast mode or something like that windows plus Oh oh my goodness yeah how do I get it back oh so that well that's my code sample um and all seriousness how do I get it undone when does well I can't guessed that I guess I'm sorry what I tried that actually and it doesn't seem to like the dual monitor setup of moving the mouse so I just want to make sure I can get back and forth okay that's good so um I should probably know what the point of the slide is now that I've gotten distracted by zooming so um we have this kind of contrived hierarchy of templates we've got our derived type it's sort of contrived but it's related to real code that I've I've used so I'm gonna see if I can get in the mail sinner Hey okay so this is is this actually readable now even with the red alright cool so we've got our struct D which is our derived type and it has an overridden virtual method from the base class called get vector and it's a vector of nth so the derived has its own vector of nth so what are we doing poorly in this code this again as mistakes I've totally made what's that Oh returning my value on let's assume there's a good reason to return by value this is constants a code why do I have the template because it's a contrived example but again it is actually a stripped-down example from let's let's say you are implementing your own version of standard function so you probably have a base class that has some base things in it and then each version of standard function with the template is going to be doing similar things I'm going to zoom back out so with many template instantiations this code really starts to blow up what we had was there's no reason for the the vector of int in the drive class to be there at all it should be in your base class and if you're doing anything with templating and an inheritance you're potentially you're going to run into this kind of thing so I joined this to myself I said don't repeat yourself in your template code I've heard of other people calling it D template izing code like move everything out of the template that you can because that adds a lot of overhead for the compiler so taking a factory example is that yeah that's the whole example so we have this factory of integers essentially for what I'm doing and I've got my factory function that is returning a make shared and I've got my main that is creating a vector of these shared pointers now let's all ignore the fact that having this vector of shared pointers is probably a bad idea for now but I want to get to a point let's we're assuming there's a good reason to have a vector of shared pointers so herb Sutter and back to the basics which I think was from going native at 19 minutes in says you should prefer returning unique pointer from your factory functions and we already saw that shared pointer is much bigger than unique pointer so let's simply not instantiate more shared pointers than we need to so in this version our factory function now calls make unique and returns unique pointer and it's creating a shared pointer from it yes yes now we have two memory allocations instead of one because we're not benefiting from make share its memory coalescing I will attempt to prove to you that that is not as beneficial as it seems that it should be so with this version where we're returning makey we're using make unique and returning unique pointer we are creating how many okay let's put it this way how many different shared pointer instantiations of the shared pointer template are actually being created here what's that for no no one yes one we're creating one because our factors returning unique pointer of type base our unique pointer is returning a excuse me our factor is returning unique pointer of B so then there's exactly one type of shared pointer created which is of type B from that unique pointer and I really apologize for the colors so taking the three possible scenarios the red that you can't read says template int but hopefully the rest of it is good because I kind of need this to all be on one screen at once here are three possible scenarios of returning and make unique returning a unique pointer and using make you to do it takes one point three seconds to compile results in a 30k executable and uses 150 eggs of RAM because I guess that's what sealless plus compilers do but if we return a shared pointer and use make shared which was our initial example we're now up to two point two four seconds to compile that example we're only talking thirty different instantiations of make shared that's all that program was doing and now we're up to two point two seconds to compile 70k executable it's more than two and a two times the size and it is now using 100 165 Meg's of RAM at compile time and then the worst case scenario is we're returning a shared pointer and using make unique to create that shared pointer of type base we're now two two point four three seconds to compile 91k executable now are three times bigger than previously and we're at a hundred and ninety Meg's of RAM used at compile time everyone clear on what we're looking at I'm compelling yes so I'm comparing apples and oranges but no I'm not because in both of these cases on the code I'm referring to is still going into this vector of shared pointers I'm saying I need a shared pointer and these are my three possible ways of creating a factory that will eventually result in a shared pointer so with all three of these cases the one that compiles the fastest results in the smaller executable is the one where I return a unique pointer use make unique to do it and then shove that into a vector of shared pointers and this is because like I said it's 30 different template instantiations we're doing that somewhat contrived example of taking an integer so these are real numbers from choice script these are the size of chai scripts executables because I have shared function objects if I use the best-case example of returning unique pointer and using make unique to get there I had a 5 mega executable if I returned shared pointer and use make sure to get there which is what we said we needed then it's actually six percent slower runtime system and it's a seven Meg executable now and if I did the worst case scenario I'm ten times ten percent slower than the the best case scenario does see you so I'll get to this in just a second but does the point here that I'm trying to get to a smaller code is faster code so anything that you can do anything within reason to make your binary smaller is almost certainly going to make your program run faster and that is why any one cache the smaller the program is the more likely it is to fit in the cache so to get back to the point of using make shared when we actually want shared and we know that's what we want and coalescing the construction of our of our two allocations that is make shared is faster if we're talking raw performance so if you're creating very very many short-lived objects than the make shared version is faster it does get that advantage of doing only one memory allocation instead of two and it has fewer conversions at runtime because it doesn't have to convert the unique pointer to a shared pointer at runtime but if you're creating very many short-lived shared objects in your system you probably have a design flaw but that's a discussion for maybe another time if however you're creating long lived shared objects which in my experience is the most likely scenario then you're going to want to do follow herbs advice I'm just repeating herb here so you can blame them to return make unique from your factory functions so continuing down the road of small echoes faster code I am losing I'm not running out of time actually this oh that worked okay so I am creating a function object and I'm using bind to get there so if I call F of world then it's going to return hello world from that function call this is 2.9 times slower than just calling the function add with hello world in it and it adds a 30 percent compile time overhead and a 10 percent compile size overhead so instead we're going to get rid of the function we're just going to use bind this is still two times slower than a bear function call and has a 15 percent compile time overhead and Scott Myers an effective modern cos plus number 34 says don't use bind essentially and STL also says don't use bind if you've listened to any of his talks about standard function so instead we use lambdas I have at this point almost I have completely removed all uses of standard bind for my code and almost all uses of standard function and preference for lambdas when I can this lambda version has zero overhead compared to a direct function call and zero compile time overhead compared to the direct function call everyone good on that sorry I'm having to speed up a little bit here so don't repeat yourself in templates avoid use of shared pointer avoid function none never use standard bind all right do we have any questions on part two okay doke my bonus slide avoid non-local data it's been my experience that non-local data that is data that's not local to your current function tends to be data that is static potentially which because of C++ 11 rules that statics must be thread safe initialized there's actually an added cost and accessing a static because every time you go to access it that you have to run code that checks to see if it's already been initialized before you get to access it so or maybe they are data that is part of your object and we need some sort of mutex protection so you try to access that data now you have to hit a mutex lock or perhaps it's you know maybe in something that's like a map that doesn't have a trivial lookup cost so it's just kind of a rule of thumb that if I'm doing in my function that's performance sensitive a lot of access to data that's not local to the function I need to know why and if it really needs to be there so again this is my summary of what we've just got so far yes I just about someone said something ask yourself what am I asking the compiler to do here matt god bolts compiler Explorer which I'm guessing you all have seen now is an excellent tool for that always Const always initialize obey the rule of zero don't do more work than you have to always prefer standard array then vector then only these other containers if you if you have a really really good reason to that you can prove avoid use of shared pointer outright use standard function and when do you use standard bind yay so the result is this is the performance graph of chie script over time it's nearly a hundred times faster than it was a six years ago when an when I started working on it and as you can see it's clear by the performance graph here that eventually I'll be able to run any script in zero time this is I have automated performance monitoring setup so with every commit it runs some performance tests and it gives me another little pip to let me know that I am NOT that I am not making things worse so one thing that I've noticed that personally that I find interesting is as I'm simplifying the code making it smaller and making it faster these these dots represent the lines represent different compilers the performance of the different compilers is starting to converge so I some finding less difference between the compilers as I make my code simpler and follow these rules so is everyone wondering way why didn't you mention context / no one ok can we read this code those are templates and while let's see if I can zoom in on a reasonable way not really ok so this code is a Const exper is sorted routine and it just returns whether or not this initializer list of values 1 2 3 4 5 is sorted and we do that calculation at compile time so we expect to the staff in a compile time like I said because it's constant for so that's what we get we get the value 1 returned that entire program says yes in fact that list of integers is sorted I'm going to return them what happens if we don't use context / we get the same thing assuming optimizations are enabled with context where you don't have to turn on optimizations with this you have to have at least bo1 on GCC clang so I say for a Const expert that if you do full enabling of context worth throughout your code I've noticed that it can make the code larger because it can create some things like data structures that exist in your data segment of your code right like it because it can it can make the code compiled size bigger and as we have hopefully proven today larger code is often slower so I am a relatively conservative with my concepts per use but my advice would be if you're going to do it if you can say hey this data structure needs to be constant for enabled go all the way with it make sure you've got all the constructors all the excesses everything that you possibly can Const exper so then the compiler can use it as much as possible to actually remove that data structure from your code entirely in the best-case scenario then one quick bonus note on final proper use of finals everyone use final now that it exists yeah ok good getting hands raised if you um if you tell the compiler that a method is final or a class's final and then you know you have virtual functions it can inline those virtual functions that are final if it's able to deduce that that's in fact that you're using it from a context where that's the final version can have a pretty significant performance impact so to sum up why this works we're basically helping the branch and prediction branch predictions and code branches and such than the compiled in the CPU do the best job sorry simpler code has fewer branches so the CPU is less likely to take the wrong branch so according to Oh Pro file the latest version of Kai script has one point eight six times fewer branches then five point one did so it's not even that far back in my history and the CPU is able to have three times the branch prediction success rate and CPU caches hundreds thousands of times faster than main memory smaller code is faster code because it's more likely to fit in the CPU cache and again according to Oh profile I am now hitting the last level cache 35 times less often than I was and I have a 1% better cache hit right when I do have to hit the last level cache and finally we're doing what our compiler authors expect us to do if we're sticking with simple idiomatic C++ falls into certain particular patterns that compilers like to look for and optimize for so then what is next from the in my opinion optimization standpoint here this is the slide we started with the beginning this is what I parsed tree of try script code looks like that says for and var at the top but that's our for loop 1 to 100 0 to 100 and now I am running optimization passes on the choice parts to try script before ever try to execute it so now I say oh look that's a for loop from 0 to some number with a normal increment I know what this is and I just optimize it away from the try script now I think nearly every project of any significant size that's written in C++ probably gets user input is that a safe assumption yeah mostly somewhat so are there ways that you can take your users input and somehow simplify it optimize that put some sort of pass in front of it before you actually try to execute it and get that next extra level of performance that you need from your code so that's me again and if there's questions and if anyone wants I have coupons for O'Reilly videos that I've done up here if anyone wants 140 percent off so any questions nope yes it can Const exper can make the code bigger no I believe what it is is you know what I need to spend more time on it honestly but my impression is that using Const exper just causes more data structures to be compiled in to the code therefore you have you know more data like actually in your program code instead of something that be calculated at runtime I need to spend more time with it honestly I've only tested at a large scale and I haven't tried to narrow it down to the specific thing that was causing my problems yes you got a question right is there anything that we can do in our code to automatically detect some of these things yeah it's like making sure that you're not accidentally disabling your moves and that kind of thing I am not currently aware of any static analyzers that will report on most of these issues if anyone is feel free to chime in uh you had a question first yeah I just want to point out earlier somebody had a question on the shared pointer example why I didn't have exception handling yes and it's just the implementation detail the implementation gets to decide when new throws right so it doesn't have to throw right so it doesn't have to necessarily always have exception handling well there might be a question of why there wasn't a stack unwind injected into the main though I'm not sure there was nothing to unwind so the compiler optimized it away in other words or like if there had been more data in main you know more things going on in main then maybe it would have had it in there you're saying okay yes uh-oh sorry don't go ahead okay on the the question of can you detect the loss of your copy and move constructors there's there is there is a brute force way to do that which is to do a static assert after you've declared your your your class so do static asserts of the you know is this movable is this whatever using the using the type traits and if you do that religiously then you will know and your your compiler will enforce that that that's exactly what you're getting okay yeah well you you said that you have almost 100 X improvement in time yes and I wonder what was the main cause of it the not doing the unnecessary work or by optimizing memory layouts it was chipping away at it like 10% at a time it's just I looked back through my entire code history and the couple of big wins are things that I did call out here like not reap arcing my integers every time so not repeating myself and my work and making sure that I didn't accidentally disable my move operations but it's yeah it's the the really big gains right at the beginning of that history or like really like dumb things I don't even remember what they were okay anything else all right thank you time's up

Info

Channel: CppCon

Views: 90,538

Rating: 4.8720627 out of 5

Keywords: Jason Turner, CppCon 2016, Computer Science (Field), + C (Programming Language), Bash Films, conference video recording services, conference recording services, nationwide conference recording services, conference videography services, conference video recording, conference filming services, conference services, conference recording, conference live streaming, event videographers, capture presentation slides, record presentation slides, event video recording, video services

Id: uzF4u9KgUWI

Channel Id: undefined

Length: 60min 29sec (3629 seconds)

Published: Sun Oct 02 2016