CppCon 2019: Matt Godbolt “Path Tracing Three Ways: A Study of C++ Style”

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Very neat trick with the logical or for branch prediction. I've not heard that one before!

👍︎︎ 17 👤︎︎ u/little3lue 📅︎︎ Oct 11 2019 🗫︎ replies

oh boy, his face when he realized he's not getting any questions breaks my heart. And it was such a fun talk!

👍︎︎ 18 👤︎︎ u/RestauradorDeLeyes 📅︎︎ Oct 11 2019 🗫︎ replies

amazing, never anger the branch predictor!

👍︎︎ 8 👤︎︎ u/Ipotrick 📅︎︎ Oct 11 2019 🗫︎ replies

One interesting thing is that typical ray tracers don't have this specific problem in this specific way because acceleration structures mean that rays are rejected more methodically and less randomly. The ray triangle test happens only if the ray is close to the triangle in question.

👍︎︎ 7 👤︎︎ u/ShillingAintEZ 📅︎︎ Oct 11 2019 🗫︎ replies

Finally! I've been waiting for this to come up ever since I saw Matt's live stream implementing this stuff.

👍︎︎ 12 👤︎︎ u/NotMyRealNameObv 📅︎︎ Oct 11 2019 🗫︎ replies

I think this is the repo: https://github.com/mattgodbolt/pt-three-ways

👍︎︎ 13 👤︎︎ u/little3lue 📅︎︎ Oct 11 2019 🗫︎ replies

At the end he talks about splitting X, Y, Z to utilize SIMD. I've found that not doing that, treating the vectors as vectors and using the "newer" 4.2 instructions is actually a lot faster.

(for some reason youtube isn't displaying comments...)

👍︎︎ 7 👤︎︎ u/assert_dominance 📅︎︎ Oct 12 2019 🗫︎ replies

Can someone explain how he was misusing ranges?

👍︎︎ 6 👤︎︎ u/esdanol 📅︎︎ Oct 11 2019 🗫︎ replies

I had to miss this during the conference, so I'm excited to be able to watch it now! Thanks again for the front-row support during my talk :-)

👍︎︎ 2 👤︎︎ u/binjimint 📅︎︎ Oct 11 2019 🗫︎ replies

Captions

good afternoon everyone is everyone having a good conference so far it's amazing isn't it absolutely amazing well thank you for being here thank you for coming to my talk I wasn't expecting to be in quite as big room as this one so I was expecting just to have a few people to talk to him have a like nice intimate chat about something that I love but thank you so today I'm gonna be talking about path tracing in three ways a study of c++ style this was an idea I had had a six or seven months ago about a talk that I could give here and let me tell you a story so often when people contact me they will ask how do you learn programming languages I've seen you do C++ I've seen you do low-level stuff how should I learn this this kind of thing and my sort of go-to answer for this has always been find something that's relatively short that super cool has like some instant feedback and write it in that language I learn by doing I'm sure many of you do too right you can read as many books as you like but until you actually sort of sit down and start typing on a keyboard and experience the usually the horrible compiler errors and things then you don't really get a feel for the language and how everything fits together so I've used this technique myself when I've been teaching my kids how to program in Python so this looks very familiar to one of Ben Smith's slides if you remember he had an ASCII Mandelbrot generator and I was pretty effective it was very short you know a couple of lines of code and I could explain what everything was and that was cool I also used it to teach myself how to program in JavaScript when I wrote a BBC micro emulator which is an awful lot of fun incidentally so I wholeheartedly recommend you doing an emulator or something like that and so when I was thinking about this talk and how much C++ has changed over the years that I've been programming it I decided that I would try to look at C++ through like a new lens by writing different styles than my natural style and basically taking my own medicine and apply my idea of writing something to learn a language to a language that I already knew before we go too much further I want to be absolutely clear I'm just a sort of grunt run at the level run-of-the-mill programmer I am NOT like an expert in any sense of any of the things I'm going to be talking about so feel free to me right now if that doesn't blow your boat it's not expert advice anything I say or is it nothing I say will be considered to be expert advice on either path tracing or C++ code Styles in general and absolutely it is not one style is better than another style as you'll see as we go through what it is though is my interpretation of what those styles are and it is real code you can go to github and find this right now and have a laugh at my expense and above all I hope this is interesting because I love this kind of stuff and and fun we'll see what styles did I pick well the first one's a bit of a cop-out this is kind of my reference style this is how I write code naturally already mostly it's object oriented and so that's one of the three styles I picked what do I mean by object-oriented I mean that the data and the code are tightly bound together and if I need to have things that behave in different ways I'll use like virtual methods and inheritance to have polymorphism the second style I'd picked was functional programming and I realized this is a kind of a difficult thing to say it means a lot of things that are a lot of different things to different people but my interpretation is to separate the code and the data and have mostly immutable data that is transformed by a sequence of functions using techniques like passing functions to other functions for polymorphic type behavior that means using variant types and other sort of algebraic data types like optional and some of the other cool things that are coming in the language soon actually and the third style I chose was data oriented design this was kind of a hot topic over the last few years when I was looking doing the research for this I saw that there were a couple of talks over I think in 2017 and 2018 both talking about data oriented design and when I sort of looked into it it sort of felt a bit like old-school c programming to me my my background was programming video games and that was all about this kind of thing so what data-oriented design is is sort of starting from the bottom from the like very base components of like what operations do I need to do and then designing my data structures around those operations so that they're efficient for both cash usage branch prediction and all those good things and then sort of building my way up so whereas like object oriented and functional a top-down data oriented is more bottom-up you might be asking what path tracing is but thankfully Ben Smith has already shown this image earlier in his keynote when he was demonstrating the amazing web compiler thing he has there's anyone heard of a web type compiler thing before I've never seen one before so this is an image that's generated by Kevin B sands per small PT so I picked path tracing because I'd done it before actually when I was learning go and rust and it's provably small because that image is generated by a hundred lines of C++ code now you don't really want to look at those hundred lines of code because they're a bit squished in but it's not terrible code if you run it through clang format first of course so what is path tracing path tracing is a way of making images it's very similar to ray tracing and I'm gonna put a big caveat here that again I'm no expert at this I've read a few books and this is my interpretation and distillation of those concepts but we have a scene in this case a beautiful red sphere there is a very bright light sphere up up in the corner we have an eye point where our virtual camera is going to be and then we sort of project a screen this is where all the pixels of the screen are in that space and for each of those pixels we're going to fire array from the eye through that pixel and see what it hits in this instance it hits the sphere and now we say how could light have got to this part of the scene how would I know how bright this is because of lighting and this is where path tracing and ray tracing sort of separate and diverge because in a traditional ray tracer what you do now is you look at your big list of lights and you see which of the lights could possibly have reached you by casting a rate of the light specific in path tracing we say well the light could have come from anywhere light bounces around is one of its defining characteristics so we won't just restrict ourselves to looking in the direction of lights instead we look at the surface where we hit we put a sphere a hemisphere over that point and we say let's collect all of the light that comes in to this point and didn't see how it would interact with the material and that's the amount of light that would bounce back towards the camera well it turns out some really really really difficult problem to solve and there's no sort of analytical solution so the pragmatic solution is we pick a random direction and we start looking in that direction so we then look down that ray and then we carry on the process this is a very recursive and a very recursive process and then because we didn't use some randomness here we probably need to do it a few times just to sort of get a bit of a feel a sample for it and in practice we'll do this like tens of thousands of times per pixel and that's what makes path tracing both very very slow but also generate very good images so in summary for each pixel far as zillion rays into the scene bounce them around randomly accumulating the intensity in color and then average them out and then ask the color of that pixel it's worth mentioning that lights are not actually special in any meaningful way in path tracing it's just that some materials you might encounter are just intrinsically bright and they don't doesn't matter how much light is falling on them they will always be emitting some amount of light so effectively that first ray that goes in will bounce around the scene forever until it happens to hit something which is bright which could be a long time my path tracer is pretty simple I mean small point that one we saw earlier we Kevin be since one is also simple but mine's even simpler I have extremely simple materials I only deal with spheres and triangles and the scenes themselves are defined in code in terrible programmer art so I apologize for the things you're going to see and that's mainly because I had to write it three times and there's only a limit to how much code I'm prepared to copy/paste incidentally this kind of thing is a really difficult thing to do if you have been trained for years to like factor out common code because you're forever duplicating code but on purpose I've made you wait long enough these are pictures that my past racer can make and it's it's really quite amazing because it's relatively simple like I said I've pretty much explained the whole thing to you and yet we get all these interesting effects and I can't really point but if you look up in the corners of the rooms it's darker because there are fewer places that the light can bounce around to reach the corners if you look at the left-hand side of the left-hand box you'll see that it's tinted red because the only way that light can reach there is to have bounced off the red wall next to it and similarly for the green side of the wall so that's a very famous image that comes from or I seen from Cornell University where they made a they actually physically made a box called the Cornell box and so painted it and took very careful measurements and it's used as a benchmark it didn't have a magic shiny sphere in it though but like when you've got spheres what are you gonna make the other image incidentally is the Suzanne monkey from blender who has heard of the Utah teapot yes so the blender folks so blender is an open source modeling package they issue that teapot in favor of this monkey and I think I prefer it actually just a quick note on the general approach I didn't rewrite absolutely everything for all three versions the math library I made or maths sorry sorry mum the math library has a vector class which is essentially an X Y Zed so it said though an X Y Zed and the one Brit in the front row and I have a norm free which is a normalized vector it's a vector that's known to be normal this turned out to be a really cool and important thing during their development because now I could actually in the type system make some observations about things that were already normalize which was pretty cool and found some optimizations I have a ray which is a position and a direction I have a hit which is just a point in space the normal at that point and some other information essentially just geometric and there's some other bits and pieces for materials and cameras which I didn't want to redo three times all right let's go with the first way the first style is the object-oriented style and I'm going to spend the most time talking about this because this will give the foundations for the other approaches and the other approaches I'd be mainly pointing out the differences and the things that I thought were important the major components of the renderer are how I store the scene how we render it that is like this entry just the main loop the main of the function radiance calculation radiance is the amount of light and light intensity flowing down a particular ray and again I'm going to wave my hands a lot here because if you look in the literature it's all about perps solid angle and other things but we're just going to simplify for now materials which is how the light interacts with a particular surface so the material controls that aspect and then the other aspect is having array finding the intersection with the point inside the scene give the poor the live caption or a little break there mm-hmm right there's going to be a lot of code sorry I forgot to warn you this is a very code heavy talk but I'm gonna have to get through it because I want you to see the things at the end so this is what my scene looks like or rather this is an object that's in my scene and in the object oriented version of course what I'm gonna do is have a virtual method so this is how I'm going to dispatch based on the different types of objects that could be in my see this intersection record is something that sort of says something about where the Ray intersected and the material that was at that point and here I'm returning a ball saying whether this particular Ray intersected this particular piece of geometry and if it returns true obviously if it did force if not and then if it did intersect it fills in that intersection record I'll be honest the first time I wrote this I use the stood optional because it's the absolutely obvious thing to put here but when I then went on to start doing the functional programming version of this I realized that stood optional is kind of a functioning thing and so in order to sort of slightly caricature each of the Styles like actually backed it out and in doing so introduced about three bugs so stood option was great right our render loop looks like this and I'm sorry I've shown parents anywhere around here there are lots of Roderick raw loop here we've got bra Y for all X for all samples I'm gonna generate a random ray that passes through this particular pixel on the screen now as I said it's gonna there's gonna be a lot of samples we're using a lot of randomness here the degrees of freedom in the camera are within the pixel itself which gives me a little bit of anti-aliasing as I randomly pick somewhere within that pixel but also it takes into account the parameters of the lens of the camera itself so if the aperture is wide then things off in the distance will be very very blurry because the samples will wiggle around all over the shop I realize this it's not technical terminology anyway we get around them ray we say how much light is coming down that ray we add it all up and at the end of it all we average it out and plot it at that point okay how do we work out how much radiance is coming down a particular point like I said this is very very recursive so the first thing we're gonna do is have our sort of base case to recursion at some point we have to give up otherwise we'd be here forever if we've reached the limit of recursion we say well there's no light coming a bit of a cop-out there are some clever things to do but and this is not clever code we asked the scene if this ray intersects anything and although I haven't shown it that scene code basically goes through the array of all of those primitives and finds the nearest one to the Ray the nearest intersection point or none if there wasn't one if we didn't intersect then again we shot off into space we never saw anything there's no light coming that way if we did here we're gonna do a little trick here this is something I didn't really talk about before so this is the point where we've discovered that that point on a sphere and we're just gonna pick a random direction it is that if you just do one sample it's really noisy obviously there's sort of a lot of randomness we're likely to dismiss the lights we're Vincent Lee likely to to get all sorts of speaking lit especailly effects so as a sort of concession to getting a decent quality image with fewer samples on the first hit I actually do multiple samples and I stratify them that means that I'll pick summat like randomly around the three o'clock Direction somewhat randomly around the nine o'clock direction and round or three six nine I can't count sorry but like and now we're in the inner loop loop of determining how much light is coming into this particular intersection point we've got another loop two loops there for all of the new samples for all of the V samples you can think of those as like polar coordinates we fiddle them around with some randomness and then we pick a one more random number that random number is then given to the material to say hey given all of this information what kind of light might be coming out of you the material will obviously need to go and called the radiance system again to say well I might only be bright because there's light coming in from somewhere else so we have to give it a way of recursing of course what I could do here is hand it the the whole renderer again and then all of the other parameters the renderer needs in order to keep track of where it is but that wouldn't be very home so what I've done is I've made an interface class which is this sampler you can see which kind of binds together all of the information and provides a very simple interface to the material to just say hey what lights coming this way that's a virtual method as well which hurts me but we'll talk about that at the end so anyway we sample a few points around this material and at the end we ask them the material based on these samples what would the total amount of light be coming out of this point this is essentially the point where something which is intrinsically bright can add in something extra here if it so chooses how do we determine it intersection so this is an implementation of the sphere primitive where you can see that we have a binding of a sphere class which I realize is a confusing name this is kind of like the Platonic sphere it just has a position and a radius and we can do geometric level tests with it it doesn't know anything about being in a scene and this primitive kind of binds it together with the material we ask the sphere if if it was hit by this particular ray and if it isn't we've can finish there otherwise we return an intersection record where we bind that hit information with the material the interesting thing here is a more sophisticated primitive she picked different materials depending on where on the primitive you hit so that's why it's divorced from the actual sphere object this is about the most hairy maths you're going to see this is the intersection code inside the sphere effectively we're solving in a quadratic equation here and if you can remember the maths it's like plus or minus B sorry minus B plus or minus 4ac square root of 4 AC all over 2a so there's a square root in there and so this top bit is getting the the bit that we're going to square root but if it's less than 0 there is no square root there is no real roots to the equation which is in this is this means that we've missed the sphere so at that point we know we can bail out early otherwise we take the square root we do a whole bunch more maths and we end up with a hit position and we're done triangles are a bit more complicated we're gonna see them a bit more in depth a bit later on but I'm going to gloss over them for now materials there's that interface that I provided to the material to allow it to sample out as you can see it's a really simple interface it just takes a vector e and the Ray and is expected to provide the material with the light coming that way and then here's my virtual method for the material itself it's given all that information and expected to work out by itself the light that's coming in this is my badly named shiny material which is a little bit shiny and a little bit not shiny and that's where that p-value comes in if you remember that random number that we picked the randomness in this incidentally makes it really easy to add all sorts of things here because we can just use the law of large numbers and use the probability of randomness to sort of pick different routes through knowing that eventually over all those things it will sum out to something that's going to be okay so in this instance we say this machine this material is partially shiny and partially diffuse now I could shoot two rays off one for that doing the the reflection part and one for doing the diffuse part and then sort of balance them out like 20% shiny 80 percent diffuse but why don't I just pick one or the other and that probability work it out for me so if that random number coming in is less than how shiny I am I'm gonna sample not actually in a direction like I described before like three o'clock six o'clock nine o'clock but I'm gonna example off in Akko around the reflected point that is the point where the light would have bounced off of this particular material and gone off so a mirror would have a cone that is infinitely thin so the only light I will sample for a pure mirror is the light that bounced exactly and came to the right that we are tracing but if I widen the cone out a little bit and use those U and V parameters to sort of pick a random point in that cone then I start to get a sort of blurrier image and that allows me to have things that are alike slightly less polished and a perfect mirror for the diffuse sample we tint it with the color of the object itself and then we do sample randomly in that Hemisphere okay we're done that's all the code I mean mostly of the code we get some pretty images that look like this and you can see all the lovely interactions between the lights and things and that's great that's really rewarding you know you write a bit of code and you get some feedback like this although of course as anyone who went to Barbara and Hansel's talked what really happens is that you write a bit of code like this and you render it and get a big black screen and then you have to go why is my screen mark is my camera pointing up is my uh my objects infinitely smaller as the intersection code not working this is incidentally why we write tests which you all write tests though right what did I like about the object-oriented version well it's familiar it's how I write code naturally pretty much the code layout was familiar the testability was good each of the objects could be poked at from outside really easily I could instantiate the Platonic sphere and fire razored it until I was happy that all of the intersection with the sphere worked I could make a sphere primitive and I could use its virtual interface I could all these things it really felt familiar to me and I was able to get a pretty good test coverage and the performance was pretty good oh I was impressed I haven't got time to go over all these things but like the D virtualization that compilers can do these days is nothing short of amazing thank you compiler writers I can't do a talk without thinking compiler writers somewhere right what did I not like about this well I giving up standard optional was the pain and introduced a bunch of bugs and arguably I'm overusing virtual when I gave an early version talk to my colleagues they they slapped me down for having like given a bit of a bum deal to the functional programming and data oriented design by like caricaturing them but giving object torrent ism like a lovely glowing right so I they said well why don't you this should be virtual that should be virtual you should input them okay and so this is the version you're seeing here but I wouldn't naturally write that many virtual functions let's move on to the functional approach incidentally this is an array of spheres and at the top right hand corner is the perfectly shiny version and to the left it's the more blurry and then from top to bottom it's like how it's actually the index of refraction there's a lot more mass that I'm not showing you here so a note on the functional programming I use ranges because I know they're coming and I've just come out of Jeff Garlin's excellent talk on ranges and now I'm aware about how badly I'm misusing them so please be gentle on me so I'm using ranges because I feel that that's the sort of functional thing to do taking data and sort of processing it through a number of views without necessarily mutating the original data seems very functional to me I also used TL optional is Sai brand in here all right there TL optional libraries awesome it's it adds on some things that are more like algebraic data types again things that I've in other functional languages come to sort of rely upon you'll see them in a second my clicker works there we are so the scene is stored very differently instead of having all of the behavior in the objects I just have these pretty much boring structures with the information here this material is not polymorphic incidentally I just store all the possible things that the material could be in one it punted on that for now and in the way that I get polymorphic behavior is I use a variant of the the two types of things that I can have the rendering is probably the biggest change gone are those raw loops I'm very pleased with myself for that and instead and this may be a terrible misuse of the library as when I was asking on the slack over some of the error message I was getting I think someone said oh my god this is how people are going to use the library isn't it which I don't think is a positive thing but we're going to get a view of integers this is a view that just creates a something we can iterate over that gives me that number 0 1 2 all the way up to height and then we get a view of inch from 0 to width and then we take the Cartesian product of them well that means is that we'll get for all X's for all wise oh that's useful to know it's whoopsie LOV m9s out incidentally everybody some compiler X bra already we pushed it earlier so anyway that for that Cartesian project product will return tuples of Y and X because that's the way round I did them and then I'm going to pipe that through a transform that transform is this lamp of this I guess it's a capture a what's a closure here I capture a few things and then for every pixel we're going to instantiate a random number generator this is my concession to making this as sort of deterministic as possible the random number generator is seeded by the pixel coordinates and the sort of sample that we're doing at the moment so that it's deterministic and I'm think I'm ok or the F P front although I'm sure someone can correct me in a minute now this is interesting because although we've created all these views with all these pipelines of interesting things nothing's actually happening yet views are very lazy it's actually the constructor of this array output class which is affecting just a glorified stood vector of pixels if we pass a range into the constructor of the array output it will just consume things until it gets to the end and as it's consuming them it fills itself in so effectively the construction of the output container is what causes all of the calculations to happen and for the results to be written out and that's kind of cool there's no actual mutable state anywhere similarly when we're actually looking for the light coming in to a particular point we're going to do the same kind of trick this is before all the you samples and for all the V samples I've hidden the code for this because it's a bit ugly but we're taking the the the integer count 0 1 2 up to number of you samples and turning it into a couple of doubles the randomized around like the clock face and then we're going to calculate the radiance at this intersection this is the equivalent of where the material stepped in in the Oh Virgin this is me saying how much radiance comes into this particular point in the sea based on these parameterised prepare these parameters of course we still need a mekin mechanism for letting the sample function here this radiance and intersection call back and get radiance from the scene and in this instance instead of passing a virtual base class we're just going to give it a function and this function will hold all of the information needed to call back into radiance with the exception of us adding one to death so that we remember that we're getting deeper through the recursion each time oh and then we accumulate over it now I think Jeff said that accumulate didn't make it into C++ 20 is there anyone couldn't confirm that but I'm using it anyway because again these were all views inside this part here nothing would actually happen until I started doing something with it and by calling accumulate it will add them all up and that's exactly what I want to do this radiance and intersection function is pretty hairy and so I've I'm just really highlighting here that that recursion point here the the passing the radiance function allows this function to do its work and then also do more samples out into the scene intersection is where it starts getting interesting so you remember that the base type of the primitive was a variant of all the things that could be well it just so happens if my clicker works well we need to visit it in order to find the actual type that's held inside that variant but through careful design or accident choose what you wish the both primitives had a dot shape and those shapes although they're of completely distinct types both supported an intersect method so I'm able to use a template here to visit it and that saves me writing do essentially duplicate code the interesting thing about the intersect method is now I am returning an optional but importantly it's that TL optional and the TL optional comes with it a bunch of useful extra features so the intersection which will I'll show you in a second it has it returns an optional intersection record and then I call map on it what map does is allow me to change the type of the thing that's held inside the optional but only if the optional carried anything in the first place so if there was something in the optional ie if it was a successful hit then my lambda is called with the contents of that optional and I'm allowed to return a completely different type here and the result of this will be an optional of the new type if on the other hand it was empty then just an empty optional of the new type is returned and that means that I don't have to worry about if things stuff and if this optional is set or continuous stuff like that I can write fairly fluid code and you'll see me chaining this in the next slide because this is how once I sort of got this in my this the bit between my teeth and got excited about like how much this could change my code I thought well I can actually rewrite how the intersection is done for the functional approach so if you remember we did that determinant thing and then there was an if check that said well if it's less than zero then clearly I can't take a square root of it I wrote something called safe squared and that returns an optional double so if I can take the square root I get a square root if I don't I get an empty double and now I can write a nice chain of and ends and maps so this will return either the square root or nothing now we look at this and then and then is very similar to map except it allows me to return an optional itself which allows me to also opt out and say actually I failed at this point because there is I didn't show in the other slide there is this part here where there's some other checks that need to be done even though I have success when he got in a square root and it still means I missed the sphere so if it's too close to zero then I have to say yeah you're good about to blow up here let's just pretend it mr. sphere otherwise as it happens if you remember with quadratic equations you end up with two solutions I picked them the solution that is nearest to the beginning of the right here and then I can map over that and say well okay I've been I've got a distance now I can go into all the other calculations I need to do it's all very neat what did I like about this approach I didn't really show it here but icons did all the things you know while I was getting myself into the mindset of immutable data I just went through all of the functions and all of the local variables I put constant front of them and I was surprised at how many times I just flippantly modified local variables when it was really unnecessary so this helps and again this is not really a functional thing this is just a mindset thing which i think is an important aspect of doing this kind of thing I think the cloud is clearer I like the optional stuff the range of stuff I'm on the fence about but it's coming soon so I'm sure we're all gathered more experience about how to use it well testability was about the same as the object-oriented version I didn't like the cryptic compiler error messages this is probably a fault of the user as they always are but every time I made a small like tweak and something didn't compile in ranges it was template hell all over again which is not really the fault of the ranges library I know for example when concepts come in we'll get much better error messages and it would help if I read the documentation instead of just assuming I could work it out by hitting alt space or control space control space which most in my program these days I am a little concerned that I am doing something that is fundamentally not not functional programming bypassing the mutable random number generator around it doesn't seem very much like the pipeline it's kind of thing that you can do especially if like I was able to like parallel eyes one of the parallel eyes gosh that's difficult to say it's things it might break down and in a way that I don't understand the performance was not as good and initially I thought it was because of that random number generator I'm creating per pixel it's it's not free to make a Mersenne twister but it turned out not to be that we'll see why at the end let's move on to the data oriented design and for the one person who might recognize that yes this is the BBC micro owl that people of a certain age in Britain would have seen on their school computers that's the problem with actually doing this kind of thing incidentally is that you get distracted making pretty pictures and you know when you're a coder this is the kind of quality of art you can come up with I sat down you should see the ass like that I had to type in stars and spaces to make it very pleased anyway the tenets of data oriented design as I understand them are designed around the most common operations design your data by the access patterns to those those sorry design the data by the access patterns taking into account caching effects and things like that try and avoid branches or make the branches as predictable as possible and as part of that don't have polymorphic objects where you can essentially randomly determine functionality as you encounter each object separate things that can be separated so that you can have well let me show you oh no sorry huh forget about this slide I'll show you in a second the changes that I made have basically determining that intersection was by far the most common operation and a huge caveat here no one in their right mind writes the kind of rate caster or the path trace that I've done here because you would use some kind of acceleration structure to cull very quickly a large number of objects based on the direction of the Ray and things like that but I was actually sort of looking to see how much faster the data oriented design would be given that I'm going to give it a huge amount of object to intersect with every single time so anyway the intersection is the most common operation by far and we the observation I made while I was writing this is that I actually only need the information about the nearest point if a ray goes through a complicated seam in the other two versions I'm calculating this hit which has all of the extra information about the normals and things and I might be throwing it away because it could be one that's off at a point of infinity and I only have two types of object so I can just separate them into all the spheres and all the triangles this is what I mean by that and even more I've broken this down by the access patterns so the very most common thing I do is intersect a line with a triangle in order to intersect the line with a triangle or array with a triangle I need to first of all determine the point on the plane where the Ray intersects with the plane of the of the triangle and then determine whether or not that point lies inside the triangle and it turns out that operation only needs to the corners of the triangle I don't need its material I don't need its normals I don't need anything else I might be storing with it so I've separated that out into his own array and then the other stuff is in parallel arrays similarly for the spheres I have this sort of platonic sphere and then for each of sphere there is a material stored in a separate array and then the intersection looks something like this I'm gonna count like with an index it was horrible giving up range for and I think I could do it I think I was chatting with somebody about how we could make it work with with it with a range for but ultimately I'm trying to iterate through a bunch of arrays at the same time and I'm gonna do the maths and if I didn't hit I can skip this this sphere bubbles otherwise if this is the nearest sphere just remember its number I don't need to do anything else right now because there could be 20 more Spears in front of it that are nearer and then I go around the loop and the math is fairly complicated but the compiler loves this it's got a very simple loop that it can look through and really do well with so it optimizes it really well if after all that we still didn't hit anything well we're finished otherwise now we do that work to work out actually where the hit point is what the normal is whether we're inside the sphere and outside the sphere all that good stuff and at this point we need to look up the material so we only actually look at the material for the thing we actually hit similarly for triangles loop over all the triangles we calculate the U the U is kind of the left Ness and rightness of the point on the plane so the triangles are the triangle plane is set up so that zero to one is like the the spot where you'll be inside the triangle anything else is outside of the triangle and similarly for this V coordinate so if u is less than zero we're off to the left if u is greater than one we're far off to the right so we can skip otherwise we calculate the other coordinate and check that and then we check the distance and then this is a bigger win actually because those normals on the triangle are quite big and so doing all of the other calculations I needed to do just for the one triangle I hit is super the rest is actually broadly the same that's why I've just shown you that highlight of the intersection what I like about this approach well the ability to optimize is great right I am it's the thing I like the most and I like the performance until I really actually measured it properly we'll see what did I not like about this well the testability went right down I mean I'm probably doing it wrong and I would welcome anyone coming up and correcting me on this but once I started making these kind of very low-level changes where I was separating out the data into separate arrays they were all held together where I no longer had this nice sphere object I could just test the heck out of and I no longer had this kind of sphere primitive that I could fire a razor and make sure it worked I had a scene that only worked by itself as a unit and so all of my tests devolved to construct a whole scene put a single sphere into it and now fire rays through the whole system and see if it hits in the right place and that felt more like I was writing integration tests than I was writing individual tests now again I could be doing that wrong also it was rather difficult to change you know I've the the the the data patterns inside sorry right the DD the objects that you're working with obviously changed the way that you'll get a layout stuff and so if I needed to add a new type if I decided to add I know a cube as a primitive then you can imagine in the oh oh it's just I make a cube type and I chuck it in the list with everything else and it just works right and I tested the cube by itself and that's great pretty similar in the in the FP case although there are a couple of places where I might have to update the code for the variants that I'm using but for here I'd have to make a whole new array and I'd have to write a whole bunch of code in the middle of the guts of the rest of the the scene and I'm gonna be honest with you right now my intention at this point was to go off on a and now I'm going to put acceleration structures in and have bounding volume hierarchies and speed it all up and everything and then I was going to show how difficult it would be to sort of retro fitness into the data oriented design but something happened before we get to that what is my favorite well all of them I love everything about all of this this this project I learned so much about what it means to be functional I learned how to use ranges or how to misuse ranges I really got down in the weeds with the data oriented design thing that was a lot of fun and I think we can all agree C++ is best because we can blend all of these things together and use all the different styles of programming where it's appropriate and not many other languages can say that I think let's talk about performance this is the machine I tested on and this is all of the things you do when you're doing benchmarking and I I ran it multiple times and they were always within one second of each other so I'm only gonna show like the results of one of this so the first thing I did was look at this Cornell box scene and this is a 256 by 256 image 32 samples per pixel and you can see how grainy the image is because this is partly the Montecarlo thing going on here in order to get it as nice as the pictures I showed at the beginning I think I did something about 20,000 20,000 samples it's definitely pays to go multi-threaded at that point incidentally I've got a whole bunch of threading code and you can rent these amazing 96 core machines off of Amazon it's so much fun to play with just remember to shut them down when you're done with them though the same so here we've got something that kind of looks make sense to me right this is a very small scene there's only one sphere they shouldn't really be once they're in there at all but I couldn't resist myself and there's 38 triangles so all of this is gonna fit in the probably l1 cache so all the cool things I'm doing in the data oriented design probably aren't gonna make much difference at this scale the functional is slower and I'm not really sure why although put a pin in that but the fact that the oo and the data oriented are the same is like yeah the owl scene is all about those spheres there's a hundred of them in the scene and only twelve triangles here the data oriented approach is starting to shine we've got more information that can then configure the l1 and I don't know how about the l2 at this point I didn't actually measure it but you can see we've got like 96 seconds for object Dean started to take a bit of a hit functional near enough the same very charitable and data oriented is just acing it here and that's awesome and at this point I was gonna go off and do my acceleration structures prove how difficult it is to put it into DoD and then that would be the end of the talk but this happened I tried this this beautiful monkey this is nine hundred and seventy triangles and two spheres what on earth happened to the data oriented design here it is slower than the object ordered this is meant to be the thing that I was optimizing for this is why I did all this work it didn't work out the way I thought it did and two weeks ago when I gave the first version of this talk it ended here because I didn't know what was going on and I was in a state of utter despair as I consider myself to be one Oh somebody who knows these kinds of things but I just couldn't work it out now because we only have a relatively small amount of time left I'm gonna give you the what we should be the second hour of this talk in ten minutes maybe they'll invite me back so what happened as I say it took me about ten days to work this out in the fullness but we're gonna give you a kind of condensed version who here knows about Perth LEDs in the tool the Linux tool yes it's brilliant isn't it so I know the the kitchen sink of what the hex my program doing in this instance I REM the the three different styles on the Shazam scene under perf stat which records all sorts of information about the CPU state it records the number of CPU instructions that were retired the number that were the number of branches that were predicted correctly the number of branches that weren't the the cache each state all those kind of good things it kind of presents you at the end with a big chunk of this is like a breakdown of what happened and it looked like this or at least the the important part I've cut out here so 252 billion instructions for the object-oriented version which is broadly the same as the functional incidentally that's 238 billion but look at the date oriented design it definitely run fewer instructions it ran a hundred and fifty billion instructions I need to beat billions isn't it but that branch rate is the thing I'm want you to look at naught point six percent in the object-oriented case that's kind of what I'd expect from like playing around with the branch predictor and things like that I've become so impressed with how well it performs on almost every circumstance that point six is kind of I take for granted but just think about it this frankly clear win of the branch predictor funny we could get it to like predict lottery numbers or something although it hasn't done so well later on has it so let's look down at the functional version here we've got five percent of all branches were misses that's a clue actually as to why the functional started to drop off before the the DoD version went bonkers so here in this final set the data oriented design twelve percent of all branches were miss predicted so this is a multi minute process here I ran the thing for like five or six minutes and through the entire execution of the program more than one in ten branches went the wrong way and send the processor off in some kind of crazed thing before it went up well now I've got it all the way back and start again and that obviously is just no good and it seems to be what care causes the performance issue so you won't be able to read this but it's up here for sort of posterity and maybe for those in the youtube can zoom in these slides will be available anyway but this is a capture of me doing a recording of where the branch misses happened and on the most recent Intel chips you can actually enable a mode where you can count precisely where the misses happened although this is what I was staring at two days into my 10-day like long navel gaze over what on earth was going on because what I hadn't realized is that the the place where the branch is marked as having been miss predicted is actually the correct destination the corrected destination of the branch not the branch instruction itself so what you can see here this yellow line is highlighting a branch the next line underneath it has and I can't even read it 44% of the time it got it had to be restated back to this point why that branch was miss predicted above and he went whoops and came back to here if we look at the destination which is the second from bottom and that also has 46% of the time forty per six percent at a time it went it fell through the branch and was wrong 46 percent of the time he took the branch and it was wrong so that's pretty much an unpredictable brunch somehow right I mean but what is it I tracked him back to the triangle code no surprise so if you remember this is broadly what the triangle code looks like there's this calculation of the you position and it amounts to this you know we do a comparison with zero and if if it's less than zero then we branch off to the continue and if it's more than one and we brought you up to continue or so I've called it skip here you know this is the thing which jumps to back and goes round the loop to the next iteration so anyone sort of starting to wonder what seeing see anything there's only not in the crowd no no no really well let's take a look right appalling diagram that I put in at the last minute this is like what the triangle might look like and think we're casting essentially completely random raise around this scene and these triangles are tiny and there are loads of them the chances of any one ray ever actually intersecting with a particular triangle is diminishing Li small but the chances of the ray intersecting with the plane that the triangle is on is almost one right unless they're absolutely coplanar there will be some point probably off a hundred and miles away where a raise is colliding with thee or is intersecting with the plane so that X there the little ray plane intersection point is totally random so the chances of it being less than zero a random 50/50 the chances of it being greater than one are random 50/50 and almost totally anti-correlated with the first it's only if the the Ray happens to be in the triangle that the branch should be skipped in this case so to summarize 0 less than 0 so u less than 0 is unpredictable you greater than 1 is unpredictable but if I was a betting man I'd say that the combined comparision of them like did this Miss is pretty predictable you could just guess that it does miss moreover you should be able to predict like if I actually combine you and the V and say if this this is inside the triangle that should just be like yeah it's never inside the triangle and I'll take the hit the occasional times that I'm wrong and come back and do the work for when I'm actually hitting a triangle you may notice there's an asterisk on the should be predictable this is very specific to the compiler the compiler has a choice when it's doing this comparison when it has a sequence of comparisons if it knows that there's no side effects in the expressions that are order together it could choose to combine them and run them all anyway but obviously most of the time if you're doing a logical or as soon as one of them is true the compiler is kind of forbidden for looking at the remaining things so it kind of has to use the u.s. less than zero first and then the u.s. greater than one unless he can prove there's no side effects of course there are no side effects and so I moved the code around a little bit to make it look like this and I was actually waiting outside my son's school to see his teacher so I have a laptop open on asking away and I'm actually get it to go super fast which you'll see in a second and I was so excited I closed the laptop lid I had my parent-teacher conference and then I came back out and I tied it up the code and checked it in and then the next morning I checked it and it was slow again and I thought did did I hallucinate that I might am I really going I mean maybe I maybe I dreamt it and what had happened is that when I'd combined them the first time around and I was sort of get stashing and get unstacking to sort of compare before and after I'd combined them but I hadn't done them this way around I'd actually put the the check of V is less than zero first and for whatever reason inside the compiler that triggered it to combine all those conditionals into one but when I looked at it to check it in I went oh it's so much nicer to write it U is less than zero or U is greater than one rather than the V check and that tripped it up and he decided to use lots of jumps again so it's really sensitive so I did this oh I'm now using a bitwise or and that basically ties the compilers hands a little bit so the compiler has to do these conditions separately and if it's if it knows what's good for it it'll use a seam of or something similar and then it alls them all together and then that's the thing that I make the decision on oh let me show that concretely so it looks a bit like this it's horrible I so I'm actually doing more work right I'm always calculating U and V and then I'm using this horrible bitwise board trick but it makes it 36 percent faster so that the fact that we go from the several unpredictable branches back to back to a bunch of extra work all the time and then one predictable branch made all the difference here those so I back ported it into all of the other styles and you can see that mostly they got better as well so that now the FP and the OO and the DoD in that Cornell scene that top line they are broadly as I would have expected you know there's nothing fundamentally different with with functional programming versus object-oriented programming that's that's great right there I was thinking it was like some problem with the range library or some problem with me making these Mersenne twister ZnO it was the branch predictor all along the owl has stayed broadly the same oh because I'd already made this change sorry oh that's a bad line unfortunately the OO version well I never really understood why it didn't suffer from this the first time around and I can just wave my hands and say I haven't had time to work it out and more puzzling something which happened when I was redoing all of this in my hotel room earlier this week is that the Susann monkey is now actually slower than it was before on the OO with this trick so I guess the takeaway from this is that there's a load of really cool and interesting things that go on inside your CPU which is not the talk you came to I know and it's subtle and the compiler gods a quick to anger as the branch predictor so this needs more work if I had more time both to have developed the code and to talk I'd go into a lot more detail on some of the other interesting things I found along the way as I say the codes on github and I encourage you to take a look I have got threading in here and I tried using stood futures and that was a lot of fun until it wasn't I had a bunch of D virtualization slides that I had to take out because the compilers as I say there I was giving them a hard time a minute ago but they really do do amazing things these days future directions it annoys me that I don't support transparency that's like the number one thing to do isn't is to have a nice transparent sphere somewhere in your rate racing demo I would have liked to put that in I'd have loved to have made the monkey the Susann monkey made out of jade there's some amazing things you can do as a subsurface scattering and it's just it's such a remaining and interesting topic computer graphics and I realize I'm totally mature so I mean I'm aware there may be people in here whose job it is to do this all day I would have loved to actually go back to the data oriented design and spend more time on it one of the limiting factors I've got here is that because I was using the vector three class that ties X Y and Zed together and maybe actually I should have paralyzed that array as well and had X X X X X all you know all the triangle X's all the triangle Y's and all the triangle Zed's and then perhaps been able to use more sim D type things but you know again that really starts to tie you into a particular implementation although it is an early version of the code incidentally I developed while I was on a break between jobs and I was live streaming it all so I'd like to say a big thank you to all the people who came onto the YouTube channel and offered advice and emailed me afterwards and and had some in particular Luca was awesome so I want to say thanks to them and really I guess my my conclusion is go learn your own language go and pick something cool that you like to do make it a rave you know Mandelbrot raytracer emulator sound generator anything go and use it and and have fun because programming is fun right seems I have five minutes if anyone has any questions or comments no is anyone from the games industry or the rendering industry gonna tell me how I'm doing it wrong huh Oliver in the front rows raised the Sam but I think he's being just basically okay well thank you all for coming I've really appreciated this opportunity Tokio

Info

Channel: CppCon

Views: 56,485

Rating: undefined out of 5

Keywords: Matt Godbolt, CppCon 2019, Computer Science (Field), + C (Programming Language), Bash Films, conference video recording services, conference recording services, nationwide conference recording services, conference videography services, conference video recording, conference filming services, conference services, conference recording, event videographers, capture presentation slides, record presentation slides, event video recording, video services

Id: HG6c4Kwbv4I

Channel Id: undefined

Length: 55min 41sec (3341 seconds)

Published: Fri Oct 11 2019