'Fastware' - Andrei Alexandrescu [ ACCU 2016 ]

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments

I love his talks

๐Ÿ‘๏ธŽ︎ 22 ๐Ÿ‘ค๏ธŽ︎ u/starTracer ๐Ÿ“…๏ธŽ︎ May 16 2016 ๐Ÿ—ซ︎ replies

I was there. The atmosphere at the keynote was a riot, video does not do it justice. Everybody talked about it for the rest of the week. They waited in line to offer Andrei beers and even those who work on concepts came to thank him for "destroying". This was one of the best talks Ive seen and I'm old and jaded!

๐Ÿ‘๏ธŽ︎ 20 ๐Ÿ‘ค๏ธŽ︎ u/rose_of_sarajevo ๐Ÿ“…๏ธŽ︎ May 16 2016 ๐Ÿ—ซ︎ replies

Just uploaded the slides here (with slightly different benchmark results) in order to simplify matters for people who'd like to implement the partition algorithm. AMA!

๐Ÿ‘๏ธŽ︎ 28 ๐Ÿ‘ค๏ธŽ︎ u/andralex ๐Ÿ“…๏ธŽ︎ May 16 2016 ๐Ÿ—ซ︎ replies

To some extent, optimization is to our industry what sexual intercourse is to teenagers. Thereโ€™s a veil of awesomeness surrounding it; everybody thinks itโ€™s cool, has an angle on it, and talks about it a great deal; yet in spite of ample folklore, few get to do it well, meaningfully, or at all. Improving the ordeals of teenage years being too large a project, the next best thing to do is teaching how to write fast code. So Andrei set out to write a book about it.

This may just be the best sales pitch for a programming book I've ever read.

๐Ÿ‘๏ธŽ︎ 6 ๐Ÿ‘ค๏ธŽ︎ u/vanderZwan ๐Ÿ“…๏ธŽ︎ May 17 2016 ๐Ÿ—ซ︎ replies

Niice talk - Andrei is one of my favorite presenters.

Just wanted to mention that the kind of introspection and compile-time branching mentioned in the talk are possible in C++14, albeit with a less convenient syntax.

It is possible to branch on compile-time conditions using static_if (example from my C++Now 2016 talk) and to check for the availability of member functions and fields using facilities similar to boost::hana::is_valid.

You could also elegantly combine various function that depend on compile-time conditions using constructs such as fit::conditional.

๐Ÿ‘๏ธŽ︎ 6 ๐Ÿ‘ค๏ธŽ︎ u/SuperV1234 ๐Ÿ“…๏ธŽ︎ May 17 2016 ๐Ÿ—ซ︎ replies

I am not sure about his arguments against C++ concepts or at least not sure yet. But the partition algorithm seems a great win. How do we get it in STL?

๐Ÿ‘๏ธŽ︎ 4 ๐Ÿ‘ค๏ธŽ︎ u/o-genki-desu-ka ๐Ÿ“…๏ธŽ︎ May 16 2016 ๐Ÿ—ซ︎ replies

His partition algorithm is still overly complicated, at least in the context of quicksort (where we can assume that the pivot is the median of at least 3 elements). In pdqsort I have what I believe is an optimal partitioning scheme. I'm in the process of writing a paper on it, which is almost done.

For quick reference, this is the partition algorithm that puts all equal elements on the left:

template<class Iter, class Compare>
inline Iter partition_left(Iter begin, Iter end, Compare comp) {
    typedef typename std::iterator_traits<Iter>::value_type T;

    T pivot(PDQSORT_PREFER_MOVE(*begin));
    Iter first = begin;
    Iter last = end;

    while (comp(pivot, *--last));

    if (last + 1 == end) while (first < last && !comp(pivot, *++first));
    else                 while (                !comp(pivot, *++first));

    while (first < last) {
        std::iter_swap(first, last);
        while (comp(pivot, *--last));
        while (!comp(pivot, *++first));
    }

    Iter pivot_pos = last;
    *begin = PDQSORT_PREFER_MOVE(*pivot_pos);
    *pivot_pos = PDQSORT_PREFER_MOVE(pivot);

    return pivot_pos;
}
๐Ÿ‘๏ธŽ︎ 6 ๐Ÿ‘ค๏ธŽ︎ u/nightcracker ๐Ÿ“…๏ธŽ︎ May 17 2016 ๐Ÿ—ซ︎ replies

Jebus, this belongs in /r/ProgrammerHumor. Thanks for the laughs. Great talk!

๐Ÿ‘๏ธŽ︎ 4 ๐Ÿ‘ค๏ธŽ︎ u/fubarx ๐Ÿ“…๏ธŽ︎ May 16 2016 ๐Ÿ—ซ︎ replies

u/andralex you should do standup. It's IT standup. silicon valley. this is gold!

๐Ÿ‘๏ธŽ︎ 6 ๐Ÿ‘ค๏ธŽ︎ u/[deleted] ๐Ÿ“…๏ธŽ︎ May 16 2016 ๐Ÿ—ซ︎ replies
Captions
it is with great during this I'm very honored to be invited at this prestigious conference ACC is the best in England right read right let's keep you that way I'm one out from now I don't want it to be a formally prestigious conference all right so Julie also asked me to ask you make a couple of other announcements please turn off your cell phones lunch will be served as usual at is your right time and what's here oh no go to users are allowed in this room Julie what so look to your left and right and if there's anyone securities at the door to escort them out so today this treasure everywhere just got a look for it this is a talk about efficiency however it's also a keynote and I don't know how to deliver a keynote so my work in title was how to write and deliver a keynote so I asked a few friends for more versed in delivering keynotes and that's you know what in a keynote what should I do and they say oh this is great just talk about anything you like like Andy Warhol you know like paint what you like and he paint I like soup he painted soup that was it he became famous so and the other said should have no code there no math it should be some sort of a fluffy thing just feel good kind of thing a keynote I can't this is the exactly the kind of talk I wouldn't go to so this talk is going to have a code and actually implied math algorithms right and measurements because so talk about efficiency but you know this thing about Dogma thinks I like is kind of intriguing because consider this what if I talk about something I like and you don't like so that is my friend and he said oh you don't care don't Aquino the same conference twice I mean badgy I put that on my LinkedIn resume and there's this guy reading you know delivered keynote ACC you twice Jesus Janet I found our next CTO you know not gonna go like that right all right so things I like so it's really interesting to think about things you like and of course I can't like I can't put here talking about things I like because we would enter infinite recursion here so one thing I like is confusing aphorisms may give an example you know about this right when you hear hoofbeats think horses not zebras the meaning being go with the most probable right so use probabilities and likelihoods and all of those nice things from statistics that we know from college you know and I was thinking must confuse the hell out of Africans what what do you mean we don't have horses what all of that statistic stuff they teach in college doesn't apply Jesus what's wrong with these folks right so must be very confusing to Africans going forward I like fast code we're going to talk about it today and there's one one thing I like I like the process concept and in order to explain why I like see process concept we need to take a fork in the road I have two sets of slides have the politically correct slides and I have the other slides hands up if you are the politically correct slides nobody zero all right so we're in together going forward okay in order to explain what I like about zebras concepts I need first to talk a bit about toilets it's not what you think it's not what you think so give me give me a minute here let me let me explain I can explain famous last words right so there's a company called photo that makes a wonderful Washlet anybody who's been to Japan who's been to Japan awesome I'm sure you can confirm when you come back from Japan you feel like you're going back in time in the Middle Ages because they have washed it that this wonderful toilet that wash it's awesome it's like it's like going to the Future and then you return it's like oh my god what am I doing here it is so dirty right that is awesome yes you can get those in Switzerland as well and you can get those in the States as well so you know it looks like it's like a toilet cover and it has a little panel on the right with buttons and you know prints bumps and it washes whatnot awesome so this company offers in the state's eight washing models of which one does not have a rim both control that's the one I bought I bought the entry-level model I'm not kidding all right so picture this remote control on a toilet you got to make sure that your partner yeah so this is kind of interesting like I was thinking what was the motivation behind that because signal the toilet is the very definition of a captive audience just it you can't be in the living room and say you know I could use a watch right now honey where's the remote it's just boggles your mind doesn't it but I figured that washes with remotes are a thing and here's why I think our thing because they combine two nice things what's better for guys signal toilet and flipping channels first it's like genius so they sell right seven models or out of eight have a remote control not kidding the remote control on a toilet there are complex and interesting it's like technology its electronics is not electrical devices it's an electronic device right modern and cool but are likely to be very useful in practice you know and kind of oddly this is uh this very list is what I think about supposes concept that's why I needed to take this little departure right there's a great blog post why constants didn't make 2017 so it's a fascinating read it's amazing you've gotta go and read it so blog just Google for it amazing and they're like several reasons listed and one of them was there's not enough time when was the when did they start after like four versions starting 2003 thirteen years there's not enough time and there's like one guy working them in on his spare time and all the community was like when you know two million people was kind of waiting for that one guy to be done nineteen of time another interesting thing is the concept es does not specify any concept definition so much about this driven development huh but wait there's more this is like the most amazing thing use of concept sometimes results in worse error messages the first paragraph in the first proposal for concerts listed better messages as the prime motivator of concept first paragraph of the first proposal it's there at the top is the first thing they see you know we're going to have better error messages and then you know the you know just so happens if they go through with it and Indians like you know what sometimes we get like worse than no messages picture this we have we have an encryption routine sometimes only sometimes it produces clear text not-not-not clear clearer than it was it makes for better literature you you know you put you put like a bad writer you get Shakespeare back sometimes not most of the time get good encryption or we have a short routine sometimes it increases entropy of the data it scrambles even more the data that was there so I find it kind of interesting so what I'm going to be looking forward to is the next blog post which is going to be white Constance did make C+ is 20 you know just bear with me for a second here bear with me so maybe maybe humor me maybe just work with me a little here maybe if 13 years was not enough and if sometimes the where messages are worse just maybe maybe it wasn't such a good idea just just a hypothesis I'm putting on the table here right enough about that let's talk about sentinels who knows what sentinels are in programming he put a Sentinel you optimize a bit yeah you save a bit of tests remember linear fine I'm Strang some decode on you this may be the first year ever I can spring some decode on somebody unannounced and not be left out the door so I'm going to use deep for a couple of reasons but don't worry if you don't know deed there's no problem this fine function is generic he takes any range in any xi type range and ii these are template parameters as it were and these are the normal function arguments so this is the syntax for introducing a template function and the range must have three primitives it must be testable for emptiness it must be able to advance upfront and it must have access to the first element front so i'm moving through the range left to right and as soon as i find the search element i'm going to break the loop and then I'm going to return the balance of the range what's left so I'm going to position the range at the Fond element if there's no element founder what whatever I'm going to return the empty range so I'm going to you know I'm going to exhaust the range and this is a very useful function it's akin to find in STL with iterators it may return the end iterator in this case it returns the empty range no surprise here implemented many times in many languages many api's many you know etc etc so how do you apply a sentinel to this sentinel before we do that let's discuss indexing so a range may be indexed or we could use you know just this sort of like linear approach which one they think is faster like indexing versus just pointer just bumping the pointer who's the who things like just pointers are going to be better pointers of indexing the rest abstain is the same okay it depends right so by my lab but so this kind of changes every couple of years it's kind of weird so by my measurements um it's just a slight edge of for indexed axis I just I just measured so my explanation is that there's less aliasing so the compare has a better time optimizing code if there's one pointer and a bunch of indexes it's easier for the compiler to figure things out than with multiple pointers and one nice thing is that the CPU has very sophisticated means to to do index addressing so it has like a unit that in Hardware does all that all the addressing arithmetic so you know why not use indexing so let's do that so I have I'm starting with an index and I go from zero to the length of the range and I'm going to bomb the integer and if the range at this position is the element and I'm going to break the loop and here I'm going to return the balance of the range from the ice element all the way to the end of the range right again if even if you don't know D it's quite clear what's going on here right all right let's make this faster finally we got to the part where we want to optimize so how do we make this faster talk to me talk to me put this ending at the very end right and do search and what we get rid of this test right and if the comparison is cheap enough then we're going to be able to measure something right nice so let's do that I'm going to save this guy into a temporary I'm going to plop the element into the last tenant of the range and here I'm going to enter scope because what I want is to automatically whatever happens in this loop I want to make sure that I'm restoring the range to where it was please former glory right this is important yeah I don't searching the range to modify the range right so I want to make sure in any case even though for example comparison throws an error or I have like some weird stuff going on I want to restore the range to where it was Anke I'm going to do a little fix-up right because you know I made I found the last element and I've got to make sure that if it's the last element I got a look a bit better you know was I looking at the right thing or not and if not I'm going to bump I in and so on makes sense okay so there's there's a number of unexplained things here by the way um I just got news yesterday so this is like a scope X it executes a any statement automatically when this a of scope is going to be exited and I found it yesterday that Peter summer Lud has a proposal for adding a similar thing to C++ requiring a lambda all right thunderous applause okay thank you very much those were for Peter all right awesome so it's just going to be measurable faster what do you think slower push with the same who's with faster this is going to be faster depends on the size if you search fewer elements is not going to win you anything what if you said 20 elements maybe but before there's the number of unexplained things you know this whole indexing thing is not not everything is indexable you know there are some ranges like you know like iterators you know input forward be directional and random access is the most powerful so not everything is indexable the worst thing is that this guy may throw and that's really bad because if if this flows as the scope is exited I have two problems I have an exception and I have a range that I changed inadvertently during the search so I want to make sure that that operation never flows and indeed for some types is not going to flow incidentally they're exactly the place for which this optimization would make sense because they're cheaper to compare right so hmm but wait it gets even worse find itself does more for all types you need to make sure that it will do I only pass Ranger Street and comparable elements and stuff so we are like a number of issues here that are kind of in the air right now and we need to we need to solve so we need to for example not only you want to restrict fine to work with ranges and their elements but we want to choose the right implementation automatically we Sentinel for certain types index for others and conservative for the other others right so we have a number of issues with regard to getting that code production-ready right here's how you fix it you're going to put a constraint on this fine function here by saying well range find range da da da da it only works if the range is an input range this is from from it's a library function it's a library compile time function I should say and the type of the comparison over that front with these bull who likes Feeny who likes screen aid who hates Feeny all right security so this is like students turret because if this doesn't compile if this comparison doesn't compile it just returns false it doesn't stop compilation it says you know this is not true the fact you know you're testing that the type of our front equals C is bull and if that fails it's just that false which means if you try to call fine with the wrong things you're simply going to log it's not going to be there it's going to bow out right great so with those we fixed things because this is all we need for to make find the conservative version work that's going to work with the front pop front thing very nice um a word of caution here because it's important II must look like an expression syntactically right it might it must be possible as an expression you can't put like an unfair brace in there right you can't put like just about anything the mozzie something looks like an expression it may not be an expression after semantic checks but it must look syntactically like grammatically like an expression right so now okay so we have we solved that problem but how do we choose with and without Sentinel antistatic if we static if we get to do to choose inside the function so we have you know if he's in put it around here he actually I replaced a bit just because there's no space on the slide I put : here me meaning the type of this guy converts to bull is not exactly bull which is fine if it returns in time fine alright static if the range is random-access then has slicing because I'm using slicing here is the slicing this is a slicer I can take a slice from the range so if the range is random accessing has slicing then I need to use either sentinel or index center if the range is writable and index if it's not writable because there may be a range of with immutable elements for example you can't write to them right so then how do I test that that operation doesn't throw well I'm going to write the lambda and I'm going to try to take its type the lambda is now throw what if this would throw an exception this doesn't compile this is a capability check in C++ no accept is not checked statically right debatable decision to be euphemistic about it debatable right what is that checked or not it's not check right it's becoming part of the pipe system meaning it's going to be checked you can check it if you so want yes great terrific so here I'm going to say is the type of an I don't care if the type is something you know I know the type is going to be the type of a land or returning void and being no throw but my point here is that I insert some code that must not throw and I put Noster on it and I check the capability and I'm done so if this so happens that I can assign to an index I'll access into the range something else it doesn't matter then I can use the central implementation in other words I can using your index implementation and if nothing works I'm going to have to use the conservative implementation and that's how I select three algorithms completely different depending on the capabilities of the pipes that I got make sense it has a name I gave it to it it's called designed by introspection and if you hear of it later I liked it before it was cool right all right please note this is a very old technique however you know when when people implement like find your original English here they can't use it they can't use it why like you know in in classic in classic C++ the process of 3 through 11 why can't use the central technique because there's not enough introspection ability of the language that allows you to check for these things automatically and choose the right implementation so you can't use it so even though this technique is old it was not applicable so my point here is good introspection is important because it allows you better algorithms faster code which is a weird conclusion it's a good point but it's not obvious right faster code means you need the language with better introspection I take by our silence the thundering silence that you agree I mean are we together in this all right raise your hand if you sing the same my friends huh all right I'm getting some hands that's great all right these are not only applicable for optimization they're applicable in a million of other situations especially static if in static if you're going to find in this town a library that it appears every 88 lines as world count counts as WC counts and the standard library has the documentation in it so it's every gate lines with the documentation so if you take the documentation ology maybe like favorite 40 lines that's huge every 88 lines it's huge used everywhere and here I talked only about behavior but you can mold structure because you can put static if condition and you put a field in a structure depending on something it's amazing because you can actually change you can tune your you can put the atoms with your hands and create gold or any any compound you want any composite you want it's amazing all right so but let me make a point here this is not about Dean it's not about it's not about you it's about you because it gives you insight into how you can optimize things so let's get back to the optimization topic I'm not going to insist more on these language things well let's see the optimization so we have the time and we have the input size and this is the time taken by failed search which is going to exercise my right a fail search is going to go all the way through so that's what I want to measure so I can estimate how much we gain I get in the best case right this is a failed search time in milliseconds input size in millions of elements and we see an eye speed up because the baseline the red is the baseline and the blue is the sentinel technique very nice and just to make it clear we have i divided the things am i measuring the speed up here so make a pretty neat speed up 10 to 50 percent this is awesome because fine I mean anybody's liable to do a linear find once you know you know right it's it's not it's not nice but you got to do it sometimes right it does should it should be how it shouldn't be called fine it should be called fine you got to do it sometimes right it's a long gotta do it let's good to have an optimal fine right you think I'm cherry-picking this I mean this is old news like everybody knows about sentinels or all worse they don't know because they've been forgotten because they've been invented in what like the 1956 right do you think I'm cherry-picking I'm not sentiments are much more widely applicable to much more complicated algorithms let's so take a look at something else how about Tony horse partition partition is the workers on the short end the selection plan what's the selection problem in STL how do you polish someone else are you even in my class some elf element thank you very widely applicable right so we have partition you choose a pivot and you're going to put everything that's smaller than the pivot to the left and everything is greater than the than the pivot to the right awesome so how do we apply something as to that it's not obvious let me show you the baseline so people partition I'm not going to insist on all of those introspection things I'm just going to assume they are done so we'll focus on the code I'm going to swap the pivot into the first position this is the first thing that people that first partition does and then I'm going to do I'm going to walk the range from both ends and whenever I find two elements that r1 is one is in the wrong place the other is the wrong place I'm going to swap them and continue I want the two indices meet it means I'm done and I return the position of the pivot this is how our partition works by the way this is already a highly optimized baseline let me give you a tip here that means I'm infinite loops here that means you kind of even like what's this it's the label and what's this a label break right some I want to avoid because otherwise I need to do one more test here like an idiot I don't want to do that I want to do the minimum testing possible right so here this so so my tip was the following the fastest code is going to use a lot of infinite loops which is paradoxical right they're infinite they never end so but I'm not kidding if you want to write fast code start with an infinite loop because it's the most core construct that's that's possible it's just it's just whatever you have with a jump add up at the end right that's it and inside you get to choose your conditions and things most people are mentally blocked by the by the notion that they must insert the test here is at the top of the loop and very often it just so happens that it's better to marshal code differently in a way that test is not the test is not done at the top of the loop so you're going to get faster code if you start with an infinite loop all right if you if you take one thing from ACC this should be it I'm not kidding no if you take one thing from this talk let's say ok there's other much better so alright so this guy is going to find from the left it's going to push low or from from behind from from the number one and if I reach the the limit I'm going to break from the entire loop and I'm I'm done right but if I found that I'll misplaced element then I'm going to break this little loop here I have another little loop which is going to say well let me go from from the left to the right and again if I reach the if I reach the is a between this is me then I'm going to break the whole loop and otherwise I'm going to break here here I'm going to swap the elements I'm not going to make progress right I don't expect this to be understood in every detail but let me let me tell you one thing which is very interesting this has been implemented of course many times in many libraries and one one mistake that's the worth talking about a bit is here here I'm using greater than or equal I'm not using greater than here I'm using greater than or equal I'm not using greater than I should because at the best I want to push these limits as much as I can before I need to swap so why am i stopping at greater than or equal what do you think I'm doing that I'm trying to conserve the order no this is not stable but you're sorry does that I understand what you're saying but that's not the answer no because I have here I'm breaking from the whole loop so I'm tasting there's no infinite so it's gonna it's going to work it's going to pass he's going to pass unit tests but there is a bigger problem which is very subtle what if we have is if every if all elements are equal what's going to happen if I put so if everything is equal that's a good start this is where we want to be if all elements are equal or most right so I'm going to just go through with the left guy I'm going to go all the way up to the right-hand side so I'm going to return that the people it is at the extreme right thank you what's the problem with that it's subtle the problem is fairness I want to partition and I want my people to be as much in the middle as possible why yeah I kind of want to have things because partition is most of time used in quicksort if you put greater than here I swear to you quicksort is going to be quadratic for ranges with a lot of equal elements it's amazing one character you change one character it go from n log n to quadratics and people did that made that mistake it's very subtle it's very easy to to make I made a MIT that mistake there's an early implementation of a certain routine and I was an idiot because it's I just didn't read the literature properly very interesting so you want to be to have some fairness in the system so whenever it's like greater and equal to stop you swap even if you swap equal elements that's still better than quadratic right so what you want is kind of keep the keep the equal things kind of where they are right excellent so we found both bounds so we come from there left and right we found two that are misplaced we swap down we go on again and again again when they meet we're done this is a highly optimized base line all right I'm giving and giving you the ball here from you know I'm going to throw the ball make it faster ideas ideas to make it faster yes put the pivot as a sentinel the pivot is the sentinel when I search in this case because we have the pivot at the first position there when I start from behind I can eliminate this comparison presumably right yeah that's right roughly speaking you're on the good track yes other ideas and actually this is done yes another other ideas from other people okay all the ideas from front deep mark he is destroyed once you be the first swap you know that there's a good people tied the other end now we're getting somewhere right yes if you have a lot of equal elements I can it do can we test before swapping right if there are many elements equal I'm going to do a lot of pointless swapping I think I think we got something I'm not understanding exactly but I think that I think there's something there so let me make sure I understand this is going to apply for ranges with many equal elements okay good let's let's look at something that's going to fly always going to be strong so you know we're looking at some interesting sentinel ideas let's put a sentinel we have it on the left let's put it on the right and let's do interesting things right so let's um let's see how I can make it faster so as I said currently fine from left fine from right shlop make progress this is my algorithm yes it we're done when the index is meet and we have many index checks interspersed with the actual computation so until you know when the routine to focus on work and not overhead this kind of a basic thing to want and we can't plant this engine in the middle because we don't know the final position of the pivot all right here's the idea as did Mark said planned sentence at both ends so first thing I'm gonna do I'm gonna make sure I put the pivot at both ends I'm gonna save in temporaries and whatnot so I'm gonna just save the state there into right and here's the interesting thing that I think you're gonna like this you're not the vacancy is like not that kind you don't take enough of not not holidays right a vacancy like in Adam what is that yeah so via so an imperfection in a crystal and there's a missing electron somewhere and whenever there's an extra I'm missing another could take its place but what happens then you have another vacancy with that guy left from so these vacancies tend to have like their own behavior they have a life they move and actually they are dissolve the semiconductors work they move vacancies around so let's do the same why the hell not let's do the same let's create a vacancy in the range so we plant the same thing us and I will create a vacancy in the range we essentially we remove an element from the range we we can we make it disappear and it's a hole of vacancy it's an imperfection in the range and next we're gonna what we're gonna do is I'm gonna fill the vacancy so we come from here we fill the vacancy and then we and we leave a vacancy here then we come from here we fill the vacancy here leaving a vacancy there right and that's how the bubble goes and the bubble is going to move up to where whence the bubble going to stop the vacancy at the final position of the pivot amazing it's exactly what we wanted I can't believe it and then what am I gonna do I'm gonna pop the guy I took and I put it put it back oops I said poop Jesus am I allowed thank you John alright I'm gonna put that the thing that I took and I'm done and instead of swapping each swap is really like it's three operations I have temperate for that for that right I'm doing half swaps and we have swaps by filling these vacancies all I do is I'm doing half of swap and then the other half I'm going to do from another place right so I'm gonna save on swaps I'm going to save on index amputation because of sentinels and I'm going to have a faster partition which is amazing as nobody improved it in what 61 what we're looking at what 50 55 years we go not today improve a routine that has not been improved in 55 years I'm not kidding I'm not exaggerating already here's some ship code for you all right so it's a long function but it's not trivial I don't expect it to put develop I put a finger on anywhere but I'm gonna describe it in in you know from from a distance jump our own time what's the time terrific alright so our people partition if the length is too small I don't care I'm gonna put a pivot at the front and I'm also as I said I'm going to create a temporary and I'm going to plant the pivot in the end as well as a sentinel so I'm going to say save is the last element and I'm going to plop into the last element and from now on I consider I have a hole at the last position of the range so what's the next thing I'm gonna do how long how what search I am I gonna do now so I have a hole I have a vacancy at the very end of the range so now I'm going to start from the left of the range right try to find something to fill that vacancy awesome here we are so I'm going to go up and then I'm going to fill the hole and then I'm going to go I'm now going to have a vacancy here so I'm going to come back from from from this from this side right and then I'm going to feel and by the interplay of the of the sentiment that I have here I only need to do one test that's not work everything else is working this loop only this is not work only this is sort of all this is over management overhead right manage bookkeeping friends bookkeeping math management bookkeeping alright so this is my overhead one line in a large largest loop right other than that I'm just doing work because I do need to compare against the pivot this is a given I do need to compare against the pivot is a given right these increments are part of life they're also sort of overhead can be considered because I'm it's their indices right but I need to do them but the only comparison I'm doing is here 55 years right and my the smartest guy in the world no no I'm which I there's a real I have a lot of reasons to be modest good reasons because actually so I invented this I thought I'm awesome and then I did some literature search it turns out that some Indian guy to two guys invented in 2011 right so 50 years 50 years it's a research paper I'm I have a slide on it but you know I felt for a few minutes that I felt really good and it was worth it right all right so this is my innermost loop here it's going to go fast really nice and in the end it's not something weird is going to happen at the end because in the end it the pieces are going to be on the floor and I need to kind of fix up things kind of restore things and and you know it's just that I figure that it the worst that can happen is that the low and high they overshoot each other by two they put overshoot each other by two one or zero it's just the way the loop works so you gotta treat each case with attention so you know this is not again this is not easy to understand but in essence you need to handle all of these possibilities so if if I overshot if I overshot overshot lobe you know these two by by by two elements then I'm going to do a little fix-up here and I have another here and then finally I'm dunking I'm gonna swap back the pivot and I'm done terrific this is going to be faster all right let's see how much faster and by the way so these measurements they they were part of the build of these slides I didn't it kind of pasted a graphic no I use logic so I say make is a make file and part of make is it runs the benchmark and it produces a data file which is reread into latte and is formatted beautifully here so these are as fresh as they get I build unlike this morning I'm not kidding every time they come a bit different which is funny but you know does my new difference is every time right so we're looking at not a huge improvement but a good improvement and what's nice is again this is a function that's very important for quicksort it's everything quicksort does because what does quicksort do partition partition and quicksort quick sorry partition quick store quicksort and what does quicksort do well partition quick so quick sort and so on I could spend the rest of my talk saying this right so awesome the relative improvements to look at 0 to 30% improvements very measurable very nice I got them from the same executable yes oh why why is that I could run make twice yes uh hmm good point actually now there's trials inside the benchmark that uh repeat but you know if there's another there's a number of things we can now we can do to measure better but um you know this is good this is great because again we're looking at something that has not been improved in fifty five years and it's awesome to be able to do so so you know as I said I thought I invented this I was feeling very happy but if you if you search for the paper engineering of a quicksort partitioning algorithm by a Bianca and Ingo you're going to find that it's been done in 2011 it has not been done in 71 right which is interesting it's been done recently right so now you know what to do standards folks you go to your HDL implementer and you say you know do this right right all right don't mention it really don't mention don't mention me okay awesome so that paper however is not as is not as sophisticated as my algorithm because it doesn't it doesn't allow to choose any people it chooses the people for you which is the first of the last element and then it uses that as a pivot which is a you know problem right they have a simpler function because they don't need to do all of this overshooting thing they don't need to worry about this fix up here so it's a lot simpler but it's also a lot less flexible right but anyhow they have they had the same vacancy idea which i think is a very remarkable and you can use it in many places is pretty awesome love it and you may think I'm cherry-picking this but I'm not you think I'm cherry-picking also he chose like three one two three algorithms that he liked and no I wasn't looking for Santino's I was looking for a faster partition and this came about what can I do it's not my fault don't shoot the messenger right but there's a lot and once I once I discovered that there's a larger pattern to sentiment that just linear fine like Jesus they're everywhere they're like aliens everywhere that product was that product that product of sparse vectors they're sorted you know the sort by index to sparse vector sorted by index each element is the index and the value and it's sort of by index right and you walk them and you want to create computer dot product which is like you multiply the elements that have the same index how do you apply a sentinel their ideas dot product sparse vectors each element is an index and the value you need to multiply all equal all elements with the same index and sum them together add a final entry with the same index so no matter how you go you're going to get there right or even simple and you don't need to really add you only need to overwrite the index of the last one with the smaller index with the greater index and then you just go through it you only check you only check when the indices are equal when one of the indices is smaller you don't check you save awesome but wait there's more set intersection how do you apply that to set intersection there's a function in they still set underscore intersection how do you apply that to it huh you don't need to add you overwrite you overwrite the last one again with the largest of the two you overwrite and then you just blaze through it set intersection you need introspection you need to fix that no no except goddamnit you need to fix I know except it's like ridiculous it's wrong with you Marshall what's wrong with you who did that alphabet alphabetically by seniority get a fix introspection you gotta have without introspection code is slow do you like slow code who likes slow code raise your hand all right set intersection merge merge same thing these three they're the same pattern by the way so this was this like the greatest interview question ever so i'm maya scandals for dot product is just a lead by the end of the interview the candidate is ten years older I'm kidding because it's so much theory and practice in there is just amazing all right merge same thing lexing upon seeing how do you apply sentiments to lexing and parsing ideas like seeing up I want I have a big chunk of text and it's real excellent parse how do you what what yes what's the how do I have a big hunk of text I need to parse C++ source code if you wish how do I come on Jesus guys you put a zero at the end you put a cent in at the end and you have a big switch and your big switch is going to treat zero out meat means I'm at the end of the range the upshot is you don't need to test for each character whether you're done or not huge gain huge gain amazing make sense so you put that you put a Center at the end and then you just blaze through your parse and it says it's a token you know it's the end talk and you don't need to check whether you're done or not otherwise you need one check for character by doing may may say one more thing I like about G process concept one drive time all right to my demise okay here's the thing do you know hello world how big it is it is my hello world you build hello world and how many lines do you get after pre-processing too many I just measured this morning with Clank it's thirty thirty seven thousand lines you know people say see process compilers are slow they're amazing they are amazingly fast it rummages through 37 thousand lines of code in like a second it's amazing technology going on there it's fantastic Slyke and you know with it you know that's hello world but it only gets worse from there it's you know it's a doesn't scale it just goes worse and worse I wish had code it's like I'm not kidding 25 megabytes after pre-processing one file i'm the sheer fact that any that you can parse that thing it's just amazing compiler technology C++ is blazingly fast it just has a lot to do it's a lot of word there so now we have the situation there right and then you know guess what people come with the sopranos concepts and we want concepts to be checked for all templates without using the template you never use them but we want them checked on the assumption that somebody may have delivered the library trust that they never unit tested they never they never they delivered code they never instantiate achievement check on that assumption they want all templates to be checked early those 37 thousand lines of code contain obviously lots of templates because if they're in the headers a lot of templates and now we want to not only spend time on parsing that text we want to spend time on checking that as well right ridiculous do you agree agreed yes okay now I'm trying to do the collect names here and I should I'm signatures okay scientists signed the release form after this okay ridiculous I want to check things on the assumption that some idiot delivered me a template that they never cared even instantiate in their life to make sure it at least kind of does something you know it lists it compiles right so anyhow let's get back to efficiency all right so my point here there's treasure everywhere you can find good things to optimize everywhere you just have to find it how do you find it by using introspection right so I'm gonna write a book I have a great title I need to write the rest of the book Scott Myers granted me the title it's his title and I you know here's very nice because it it's a great title so again I have the title I need to write the rest and fill the blanks right fast where a lot of optimization techniques here are some things and I'm going to talk just about half a minute about each benchmarking techniques you know that most people don't know how to benchmark so this is going to discuss that strength reduction you replace the stronger operations with simple operations changing algorithms for fun and profit right cache friendliness how to arrange things such that you have better cash for endless useless in direct access avoiding the writes things like that in the right elysian how to get rid of writes through pointers which are the worst kind right memorization and obligation abbreviation is a word that I invented 24 hours ago is the opposite of memorization John Lake has said it what was the word ignorant is better and I said you're ignorant that can't be you know etc ignorance so memorization is I'm I'm saving computer data sometimes it actually it repeats better to recompute it in some subtle cases and let's call that the ignoble avoid ignorance or abbreviation who's for obligation who's for ignorance the people's focus all right hoisting and lowering what's hoisting a lowering hoisting it's a classic compiler optimization actually take things also loops you take things up and pre compute computer once and you're done and awesome what is lowering the opposite of hoisting I put things inside what do you want to do that when you are role for example right when you are role you want to lower lowering also has a different meaning which is you want to replace higher level constructs with lower level constructs and that's also sometimes effective optimization and a lot more there's so much stuff I this just so much stuff that's good there about writing fast software that I learn in fact five and a half years at Facebook and even before that it's um I'm bursting with enthusiasm about about this all right with that I'm done thank you so much thank you you've been great thank you to have a meaningful question John questions questions yes now you both get giving the ball so come on that's heavy okay if this the things you're talking about in the last slide the compiler optimization techniques why should we be spending our time coding to those when the compiler bhangra for you understand so these are compiler optimizations why am I talking about them in in a book and optimization these are also optimizations that you can do by hand in the very often cases when the compiler doesn't for example strength reduction you replace multiplications with a with additions for example right it turns out that you can't change entirely algorithms to unrecognizable forms if you look for for a strength reduction and that's not something that the the compiler can do you change the algorithm to with a one that yes question question I didn't get it sorry the book can be written faster okay good one all right question one more question question for the ball all right what is the Scott Myers give up on fast where I was where he was working on it was looking forward to her so I'm glad you've taken it out but why did he give it up why did Scott give up writing fast word um I think he retired he's just yeah alright we're done thank you very much so just want to do a quick poll who's here is a member dat see you and that's what that's 50% 60% what's that look like so for those who are not want to find out more about the ACC you we have our stand on the opposite side that wall come see us and we'll explain what it's all about just want to say thank you very much and thank you very much I think you guys thank you you
Info
Channel: ACCU Conference
Views: 30,543
Rating: 4.8860397 out of 5
Keywords: ACCU, C++, Developer, Conference, 2016
Id: AxnotgLql0k
Channel Id: undefined
Length: 62min 17sec (3737 seconds)
Published: Mon May 16 2016
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.