My Computer Taught Itself to Play Minecraft

Video Statistics and Information

Video

Captions Word Cloud

Reddit Comments

Hey everyone! I’ve been working on a minecraft reinforcement learning project (genetic algorithm based) off and on for a little under two years and decided to make a youtube video on my project and what the results were. The main goal of the project was to just write all of the code from the ground up and see what the network would be capable of, and I’m pretty happy with the results over all. If this sounds interesting to you feel free to give it a watch, and all of the code for the project can be found on my github, https://github.com/Poppro

👍︎︎ 1 👤︎︎ u/Poppro 📅︎︎ May 13 2021 🗫︎ replies

Captions

hello everybody welcome to the first installment of my new reinforcement learning series today we're going to be teaching a computer to play minecraft before we begin we have to figure out what it even means to play minecraft as anyone who's looked at minecraft on youtube knows there's a huge amount of variation in how the game is played in the community for example we could try to have rebaka all the way to fight the ender dragon but for the purpose of this video let's take it easy on the computer by sticking true to the game's name specifically we'll reward the computer with increasing points for collecting dirt stone coal iron gold and finally diamond now that we understand the basic goal of the game i want to give you a brief road map for where all of this is headed the project is split up into three main components a neural network which will make decisions a minecraft simulator which will train the network and a minecraft mod which will visualize our network's learned behavior first let's talk about neural networks i'm sure some of you are familiar with neural networks but if not since this is the first episode of the series let's do a quick crash course here's a network now the nodes on the left are input nodes they just take in a number these lines here are called weights and they are simply factors that the number gets multiplied by the nodes they are connected to are called the middle layers which have the value of the sum of the weights they're connected to finally on the right we have the output nodes whichever node is the highest value at the end of the process is typically considered the network's chosen node this network is going to be the brain of our player now we just have to figure out how to compress playing minecraft into a series of inputs and outputs that can fit into it to do so let's think about how humans play minecraft we move around with the wasd keys we look around with the mouse we dig by clicking the mouse and we jump by pressing the space bar so our bot should be able to walk forward turn right turn left look up look down dig and jump these will be the output nodes of our network our bot should also make decisions like humans people make decisions in the game by looking at blocks on their screen so it would be unfair to just tell our bot where every block in the game is instead we should employ a technique called raycasting which will shoot lines from the bot's point of view that will collide with blocks these blocks will be the input to our network we'll also let our bot cheat a little bit because humans cheat when playing minecraft 2. so we'll let it know it's a vertical position in the world now that we're oriented first things first we need to code a neural network [Music] we now have a c plus plus static library that simulates neural networks and can be used for a lot of future projects if you're feeling particularly adventurous you can check it out on github using the link in the description next we need a simulator that our bot can play in but before we jump back into development let's revisit the roadmap to better understand why we need it the job of the simulator is to train the network with reinforcement learning by running millions of minecraft games the basic process looks like this we first create an instance of the world in our simulator and assign a neural network to the instance then we create 99 copies of the bot each with slightly different neural weights than the others this process is called mutation and it mirrors genetic mutation in evolution with all 100 instances loaded into memory we start the game and let each of the bots try to collect as many points as possible the bot who gets the most points is selected as the winner and their neural weights are passed to the next generation or epoch by repeating this thousands of times we eventually get better and better bots this is all great but we still don't even know how to load a minecraft world into c plus plus in the first place after taking some time to conduct a bit of research i found a few programs that can help us achieve this first we'll load a minecraft world into mcedit in mcedit we can select the region in the world we'd like to load into our simulation and then export it as a schematic from here we can use a rather obscure terminal based program that i found on github to convert the schematic file into json once the file is in json format it's straightforward to load the blocks into our simulator and the rest of the work is all standard logic with this as a starting point let's go ahead and implement these ideas in code [Music] you may have noticed a few cuts during that time lapse honestly it just would have taken too long to show the whole process there's a lot of nuance that had to be emitted from the video like dealing with the character's rotation and thread management but all we need to know now is that we have a working minecraft simulator for our neural network to play in as such after spending a while looking for interesting worlds to simulate i decided it would be best to start off with an empty field and just see what happens so i'll go ahead and load this world into the simulator and get the training started while that's running let's get started on minecraft mod to visualize the network's behavior [Music] this mod was a lot more work than i was expecting but we now have an ai mob which can read in data files from our network and visualize whatever decision the network makes for example walking digging or looking around in the meantime our network wrapped up training to offer some more context while we watch its progression here's a visualization of the network the bot was trained on as you can see it's a lot more complicated than the toy models we were exploring earlier in total there are about 11 million parameters or weights in the network which theoretically will allow our bot to learn complex behaviors let's take a look at how its evolution [Music] progressed [Music] do [Music] [Laughter] [Music] [Music] [Music] all right jokes aside let's take a look at the final epoch this version of the network was trained on as we can see the network is exhibiting some complex behavior like looking down and jumping to dig ores that otherwise couldn't access however watching it for the most part just digs straight down isn't exactly satisfying from a human point of view i'm going to try to address this next to promote more dynamic digging styles we'll basically just stop rewarding the bot for digging straight down too many times in a row this feels a bit like cheating to me but it does make sense as one of the oldest rules in minecraft is to never dig straight down or you'll probably end up in a pool full of lava i'm also interested in how the network might evolve with a bit of variation so i've scoped out a new world for a bot to explore most notably the location starts in a cave which when combined with our new rule i'm hoping we'll promote more organic mining techniques i'll preface this next time lapse by saying i'm extremely pleased and impressed by what the network was able to accomplish it exhibits quite complex behavior and was even able to creatively find an exploit to one of the bugs in my simulation code see if you can find it while you watch [Music] [Music] [Music] [Music] [Music] do i really hope you think that was as awesome as i do it honestly exceeded my expectations for what the network would be able to figure out also if you didn't catch it the bug in my simulator was that i didn't check if there was a block above the bot before they jumped but since the final network didn't end up relying on the bug i'll just let it slide as a final point of interest let's take a look at what the evolved neural network looks like here the weights that are close to negative 1 are green weights that are close to zero are black and weights that are close to one are blue there seems to be some non-random structure visible on the network but to get a better idea of how it's working let's watch it make some decisions [Music] [Music] with that i'm happy calling this project a success that said if any of you are interested in running some tests of your own all of the code used in this project can be found on my github page so feel free to go wild and let me know if you make any interesting progress i have another reinforcement learning project coming out soon that i'm really excited about involving a specific board game so in the meantime be sure to like subscribe and hit the notification bell so you don't miss it thanks for watching you

Info

Channel: Poppro

Views: 622,076

Rating: undefined out of 5

Keywords:

Id: IQ7sK6PezJo

Channel Id: undefined

Length: 14min 47sec (887 seconds)

Published: Thu May 13 2021