I added multithreading to the world generation in Godot

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
hello friends in this video i will tell you my story about improving the performance of plasternote's first generation so let's call this talk a development novel it's relatively long and contains some technical stuff one of the main selling features of plasternote is its infinite procedural world generation it is important because it allows the players to keep digging and exploring further and further with the goal of finding better resources and unlike in other games where the players can also dig through the world in the plastronaut game the player can do it extremely fast and can quickly get to the bottom of any pre-generated world so having an infinite world that is generated in run time will pretty much remove this barrier but this great feature has also proven to cause me a lot of headache and so far i have re-implemented it quite a lot of times in order to get the sufficient performance so to understand the problem better this is how the world generation works in principle all the world's data is stored in tilemaps and in blasterout there are quite many of them the player can only see two tilemaps that are called tilemap and tilemap pack but in addition there are also some invisible tile maps like the tilemap data that is not visible but it contains the type of the block like whether the block is a mushroom or a crystal then there is also the tilemap modified which stores all the blocks that are destroyed or modified and this is used to load and save the game and finally there is the time of chunks that i added recently and it will contain meta information about all the generated chunks so when turned on it looks like a mini map of the game the tilemap chunks also has a script attached that loops through the chunks that are visible or in close proximity and for all the empty chunks it cues them for generation every frame a few of these chunks are taken out of the queue and sent for processing during this first round called generate chunk and this only calculates whether or not there is a tile based on multiple layers of noise once the chunk and all of its neighbors are generated the chunk is put back to another queue and it goes through a similar process but this time the actual visuals of the tiles are calculated and this also means that the tiles have to look at their neighbors for example to place the grass correctly it has to know about the bottom block otherwise it might just appear in me there all right that's the algorithm in nutshell and so far it's working quite well and it's even performant enough to power the current webcal build however i'm still a little worried about it i'm afraid that when the game grows i want to generate more stuff and also in a multiplayer game the host needs to generate the surroundings for not just himself but also for all the other players that are currently in the game and i'm quite worried that the current solution won't scale in the long run so that's why the obvious solution to this problem is to move these expensive calculations to threads and the operating system can then balance those tasks over multiple idle processor cores and luckily in godot script it's also quite easy to create new threads and assign tasks to them and that's what i started doing eventually i decided to use kodo's trade pool plugin made by c marcos and this has an implementation of futures which means that i have to just submit a task and when it's finished it sends back a signal that contains the future object which has the task results attached the main reason why i needed the features is that kodo doesn't allow to modify anything in the scene tree other than from the main threads and this includes all the tile maps and the tiles they contain so whenever i try to set the tile in another thread it automatically gives an error the work around to this is to use the thread to fill a dictionary and send it back to the main thread that will loop through all the values and put them to the tile map so implementing it took me about two days of work but eventually i got it working and started to evaluating the results and the result was that there were no difference in performance i even thought that maybe kudos provider doesn't show things correctly so i made a build and dug up my surface pro 3 which has i5 in it but it gets hot and starts dropping frames under heavy loads and this makes it the perfect tool for the performance testing but even then both builds the one with trading and one without it both performed exactly the same furthermore i couldn't even see any workload difference in the task manager so in the end i felt quite defeated i created a new branch and put all the multi-threaded codes there to wait the better times these better times they came this week which is about three weeks later because i just couldn't let it be and i had an idea in the meantime i had made a test project and it just calculated 100 000 iterations of pi in all the available cores and this was enough to give some work to all the 32 threads in my main pc so clearly the trading was working and i had made a mistake myself my idea was that even though i didn't set the tiles in a thread i was still reading them from the tile map which i thought might block the threads from running in parallel so this time i will first copy the tiles in the dictionary and then pass the dictionary to the thread as an argument which means that now all the trade will do is just to calculate different values between its own local dictionaries about another day of refactoring later i finally got it working and this time the result was still exactly the same which means no distinguishable performance difference and when i compared the profiler data it looked a little bit off and the problem was that the functions that run on the main thread still take much less time than the overall frame time it almost looked like the threaded functions still contributed to their frame time and overall i got really puzzled about what the kodos profiler actually shows so i discussed it with different people and eventually asked help from godot's discord and quite amazingly a very nice person started to helping me out so i shared my profiling results and also code snippets with him and eventually when i showed my task completed methods he told me that this might be the problem and i'm just setting too many tiles and even though in the compiler the task completed time was quite negligible it turns out that for every group of tiles that i'm setting it increases the overall frame time quite a lot so yeah this is where the dog is buried and compared to the rest of my generation algorithm the set tile is actually the slowest link so what's the conclusion it seems that trading the current generation code is not providing big enough benefit which is also mostly negated by the additional calculations i have to do in the main thread to prepare the data for the trades and resolve the results so in the end i think i will revert back to the old single traded codes and the reason is that the trading also has some major downsides as well first it makes the code more complex because the generation can't work with tilemaps but has to rely on dictionaries secondly with the multithreaded code i had some minor visual artifacts at the edges of the chunk which are really hard to get rid of and finally during some point of the development i got a windows blue screen error with system dreaded exception not handled message and i have no clue about what was the cause of it but i don't want to get it again so in the end i will leave it as a sad development story and although i have some ideas about how i can reduce the number of times i am setting the tiles i also want to add more visible tile maps because according to the latest concept art that i made i also need some foreground tiles for adding more vegetation but i hope it was an interesting development story and you won't make the same mistakes that i made that's all thank you for watching and don't forget the wishlist blaster note on steam
Info
Channel: Perfoon
Views: 1,683
Rating: undefined out of 5
Keywords:
Id: XzEY38ej91Y
Channel Id: undefined
Length: 10min 25sec (625 seconds)
Published: Sat Feb 20 2021
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.