Character Control with Neural Networks and Machine Learning

Video Statistics and Information

Video
Captions Word Cloud
Reddit Comments
Captions
[Music] okay so these are two assassin's creed games syndicate and origins the most recent one so syndicate came out in 2015 and origins came out in 2017 and if we look at the size of the maps in this game we'll see that it's grown incredibly so it's grown about 250 times larger in the span of a couple of years so this is like the kind of increase in scope which players expect from ubisoft games but this talks about animation so does anyone want to hazard a guess how many animations were in assassin's creed origins so there was roughly 15 000 animations in this game so uh it had like a three-year development cycle so that means on average there was about 20 animations added to get added a day so that's 20 animations that need to be recorded in the motion capture studio cleaned up by motion capture technicians they need to be processed by animators edited tweaked and then they need to be put in game by a designer and they need to be tested and qced by testers so that's an incredible incredible amount of work and the question is what are people going to expect in a few years time from ubisoft games in regards to animations are they going to expect 100 000 animations or a million animations like what is the sort of scale we're talking about so i work for la forge which is ubisoft's main r d department in montreal and one of the initiatives i'm working on is basically trying to tackle this question so how can we prepare for this scaling up which is bound to happen and i've been working with a bunch of people there in particular simon clave who did some previous work on motion matching and so today what i really want to show to you is our philosophy and how we've been thinking about how we might be able to scale these systems up okay so let's have a look at the background so the most high level overview of an animation system looks something like this we have some player input this goes into this black box animation system and as output we get a pose but we don't actually send the button presses directly what we do is we convert them into some kind of high level representation of what we want like where we want the player to go which direction we want them to be facing these sorts of things so this we're going to call gameplay and one nice thing is that we can also use inputs from npcs or other sorts of controllers so what we actually do with this animation system typically or at least the way it's been done in many of the large ubisoft games is that basically here we have a state machine and we have a bunch of different nodes in the state machine and each of them represent like a state the character can be in and that roughly means like what animation the player is playing so here we might have a state which means the character is in locomotion and we can click on this state and what happens is it pops open and inside are many many more states and we can click on another one of these states so maybe we click on a walking state and inside there's a whole bunch more states so it's like hierarchical this state machine inside and more states and more transitions and more logic for how to move between states so maybe we click on another one like a maybe this is like the turn right mode of the walking state of locomotion and it pops up another state machine and we can click again so maybe here we have the turning right to idle state of the walking state machine of locomotion etc and we can click on this one and what it pops open is something different so inside here we have a blend tree and a blendery is basically saying how can we blend a bunch of different animations and what conditions are we using to blend them to produce the final pose we give as output so we can click on one of these nodes in the blend tree as well oh and it pops up another blend tree so this thing is like hierarchical as well and we can go down again and now maybe finally we get to the data okay so now we actually have inside this final node all the way down we have a clip uh which we've recorded in the motion capture studio maybe been touched up by animators and the file name looks something crazy like this so it's some sort of crazy string describing exactly what's happening in this clip and so our data flow looks like this we get our input from gameplay we're going all the way through this crazy state machine through the blendtree all the way to get to the data here and then we're outputting the pose of the character based on this and you may think i'm like exaggerating for comic effect but actually it's much much worse than you could ever imagine so like i said there's 15 000 animations this is for the latest assassin's creed i asked them to run some stats for me about 15 000 animations about 5 000 states in this state machine and it's about 12 levels deep so that's the kind of magnitude we're talking about and okay so you may think okay so that's how it is but it shipped many many successful games and made lots and lots of money so what's the problem so the problem is this right we have a director and the director comes to you one day and you're maintaining the state machine and he says we want the player character to be able to be injured so we want him to be walking around injured and so you're maintaining the state machine and you think uh well what could i do to add this to the state machine okay so one thing you could do is you could add an injured node at every single leaf state and just have a kind of transition which switches between being injured and not but some states don't make sense when injured so maybe some whole parts of this hierarchical state machine don't really make sense when you're injured so you need to do something like go through this whole state machine and work out when it makes sense to be injured and when it doesn't another option is we could kind of duplicate this whole graph and replace all the data in it with like injured versions of the same data but we sort of had the same issue which is that some states or some leafs we might not have recorded any injured data or injured data might not make sense so we might have to kind of jump back to the original tree at random points and finally we could just try and hack something together for this case but of course if you do this and you have too many hacks then the technical debt builds up and you start to get a lot of trouble but anyway your boss wants you to do this so you have to do something so you do something and then sometime later the director comes back to you again and he says great so the next scene the character's injured he's also tired and he gets stabbed in the eye halfway through so we need locomotion for all of these different things so then your action is something like this so the dream the dream would be a setup more like this okay so day one the director comes to you we want the player character to be injured day two we go to the motion capture studio we capture a bunch of injured motion we'd limp around or whatever day three we grab all these files from the motion capture studio we drag and drop them into our system and day four everything works day five director says okay now we want the character to be injured tired and he gets stabbed in the eye so we do this at the motion capture studio we go and stab ourselves in the eye drag and drop everything works great okay so what this really is talking about is like scalability so how can we have a process for building animation systems which is scalable and it doesn't give us a headache every time we want to add something new so that's what this talk is all about and these are the three kind of ideas in this talk and really these are ideas that come from machine learning in some way so what we ideally want is a generalized solution which can work for many different cases and to get this what we need to do is specify exactly what variables we want in the system and to get this working well we need to think more carefully about how we manage our data so these are kind of the three stages i'm going to talk about so first we'll talk about data separation so if we have a look again at our setup which we had before there's kind of an awkward thing which is that conceptually all the data is living inside this state machine so the first kind of conceptual step is to have a separate database and to pull your data out and have it live in the database and have your state machine rather than outputting a pose actually output a kind of a pointer of some kind or a file name with a time so of course you're thinking yeah obviously this is how we actually do it in practice like we don't actually have the data living inside the state machine but there's like an important conceptual difference here and if we want to blend well we can have the blending kind of happen after this data retrieval stage we ask for multiple different animations and we have we give the different weights as well and there's an the kind of important conceptual reason why we want to separate out this data this database is that the first thing we can do which is really nice is get rid of this craziness with the file name so we have our file names in our database and we have our file name or some sort of pointer coming from the state machine and the first kind of step is to basically replace all of these things which were in this description the file name with tags so now what the state machine outputs is basically a list of tags for what it desires the motion to be like and we can have similar tags in our data so for example this is like a little prototype tool we have for tagging our data so here we have a really long take this is about 15 minutes of raw animation data and what we've got done is we've gone and tagged all the different sections for what they represent and in a sense this is the same as editing or cutting up this clip into small sections and your tags can be as detailed or simple as you like so for example if you have a cut scene you could have a unique tag just for this cut scene and your state machine could output this tag or you can have very general tags like just walking or locomotion or turning these sorts of things so you have a lot of flexibility when you use these tags instead of the file names and you can also have multiple ranges inside these files so that's pretty nice so already there's a lot of nice things so one thing is that your state machine development doesn't depend on your database so you can update your state machine and update your database separately you can also swap out your database for new characters so you can keep your state machine logic exactly the same and just swap out the database if you have a different style of character or you can have some sort of fallback database so if the state machine requests a set of tags which are not in your database you can fall back to some more general database and the overall kind of idea is that let's move away from thinking about assets and start thinking about databases as a whole so and move away from kind of file names and start thinking about tags and have a separate process for motion retrieval so what we've basically done is kind of got this classic game development set up with file names and assets and tags and editing these sorts of things and we've moved and put it into a database which is really how machine learning people think about data so that's the first stage done so now i'm going to talk about specifying the desired variables okay so let's have a look at our setup again so okay we've improved some things but we still have this really huge state machine here and this state machine is kind of a complex thing because it's a mix of gameplay and animation so some states are purely aesthetic so maybe like a turn 25 degrees or something like this it doesn't really have any meaning in gameplay it's just there so that we can play a slightly different animation and we have some states which are important for gameplay so maybe if the character is falling over or if the character is doing a role that's actually a different thing in game play and you can you can perform different actions depending on these states so one thing which would be really nice is if we could separate out this big state machine into purely gameplay related states and purely aesthetic ones so one thing we can do is this we get our state machine and we get just the kind of gameplay simplified gameplay version so only states which have some meaningful thing to do with gameplay that we keep and all the aesthetic states we remove and we pull in all the other data related to the aesthetics from outside so for example the fact that the character is male may this is kind of like a global variable or or something we can get in from outside or things from gameplay like where we want the characters to be going what speed they want to be going at where they want to be looking lots of these things have nothing to do with gameplay they're not man they're not they don't change the gameplay at all they're purely aesthetic so we can pull these right through and kind of bypass this state machine and we can tag exactly the same variables in exactly the same properties in our data so these are actually numerical values so that's kind of the main difference now they're not just tags some of these are like numerical values which we also have labeled in our database so we need to kind of change a little bit how we look up uh which clip to play next and the basic idea you can use is just filter out the clips where the tags don't match and return the nearest numerical match for the rest of the numerical inputs okay so that's how we do this and we're going to call this matching because it's not really like querying a database anymore it's more like trying to find the best match for the desired inputs and something kind of magical happens now which is that this state machine sort of disappears into gameplay so now maintaining the state machine is really like the role of the gameplay programmers and as animation programmers what we're really doing is the stuff on the left hand side of this so now if we think that the gameplay is the state machine is purely a gameplay construct now our animation system looks more like this all we have is this matching component where we try and get the user input and we match it to something in the data there's one kind of extra step which is that we don't want gameplay to specify timing animation so we want them to say what kind of sort of animation they want but we don't want to say okay we want them we want an animation which is kind of halfway through playing or something like this and one thing we can do is we can use the previous pose which was output by our system to describe the timing animation so maybe the previous pose was the character with the right foot down so now we know when we match our next frame we want to output or our next clip we want to play that it should start with the right foot down and this setup is basically extremely similar conceptually to what was shipped in for honor and what was called motion matching at the time so there was a gameplay state machine and there was tags and numerical variables coming in and there was this kind of matching happening with the database so this is a little clip from for on where you can see a bit how it's working so here we see all the different potential clips in the database which could be played next and essentially the system is picking the one which most closely matches the desired user input so here the desired user input is the red trajectory along the ground and it's picking the which best matches that and one of the really nice things about this setup is that unlike this big state machine new variables are really easy to add so for example uh if we had the injured if we wanted to add add an injured state to the character we just have an additional input saying whether the character should be injured and we add this as an additional tag in our database so the idea here is that instead of states we want to describe the animation we want by variables instead of querying this database we want to have some sort of fuzzy matching where we just try and roughly get the best clip that matches and we need to annotate these variables in the data so we need to annotate these tags on these numerical variables so that kind of gets us to the point of motion matching and something that's been shipped in for honor and something which has been proven pretty effective so what's next right what's the next step how far can we push this so that's the third point here which is talking about generalizing the solution so i have a minor warning which is that when you talk about generalization often the way we generalize is by using math so there is a couple of equations i hope that's okay okay so let's have a look at our setup again uh the first thing we can do is we can move the pose over to the right hand side it doesn't really matter the fact that it's it's looping around we can actually just consider it as part of the input and we can think of this matching process as a mathematical function which takes a big list of numbers takes a vector and just produces another vector so produces a big list of numbers and the way we can think about it is something like this so let's say we have an input or on the right here the way we can represent it as a big list of numbers is something like this so for example we can have this sort of enumeration where we just give a one hot vector to say whether the character is male or female similarly for whichever kind of general state they're in like locomotion idol falling this can also be an enumeration similarly for the style and the numerical values we can just give directly so this can be the the position the character wants to be in in the future the speed they want to be going at where they're looking we can also have the pose so for example if we had the pose as output as our y here we can represent it by using the position of each of the joints position and rotation of each of the joints so here with the first joint position first joint rotation second joint position second joint rotation and then the final joint so whatever we give as input and whatever we give is output we can represent them as two huge big lists of numbers and we can see our function our matching as a mathematical function which maps from one to the other and when we think about our database in this regard what we actually have are pairs of x's and y's pairs of x's and corresponding y's so for each pose in our database each y we have all the associated variables and tags so this is our kind of associated x so our database now is really just pairs of x's and corresponding y's and our matching function is just a function that uses this database to map from x's to y's and this is exactly what we call supervised learning in the machine learning community so it's using a database of x's and y's to learn a mapping from x to y and we don't call this matching in the machine line community we call it a regression okay so motion matching is a special case of this regression called nearest neighbor regression so we can think about all these things in terms of existing machine learning concepts so i'll explain how that works now so basically in nearest neighbor regression what we're doing is we're taking our x which we get as input and we're calculating the distance to all the x's in our database and we're returning the y with the smallest distance to our x so for example here we calculate the distance to all these x's and we see that this x1 has the smallest distance so we return y1 we return the pose y1 so when we're doing this matching and finding the nearest numerical match this is exactly the same as doing nearest neighbor regression all right so all of this has been a little bit conceptual so let's actually see a real example so here's some training data we use or some of the training data so it's really just a kind of really long 15-minute take of unstructured locomotion and we got about 200 megabytes of this training data with various different characters all different things and our input x what we want to give is we're going to give the current uh the previous frame joint positions so positions of the joints in the previous frame their velocities how fast they're moving uh and also our target is going to be where we want the character to be in one second's time what velocity we want them to be going at and which direction we want them to be facing and our output y is actually going to be a block of animation so we're going to output a one second block of animation all the joint positions and all the joint rotations uh for one second and our function f what we're gonna do is we're gonna call it every one second so every time we need a new block we're gonna call it get that new block and put it in or if the user input changes they just specify a new desired position or direction we're going to call it straight away and get a new block straight away so let's see what happens if we use nearest neighbor regression so what we get is we get a system like this so we have the character and he's following the desired user input and the kind of most noticeable thing is that there's a kind of click so every time the nearest neighbor regression chooses a different clip to play from you get this instantaneous jump where it's switching clips but what we can do is we can add some blending okay so how now we have a little bit of blending between clips to make sure they kind of blend smoothly between each other and we get quite a nice system so it's pretty responsive and most importantly we get lots of kind of diverse interesting locomotion it doesn't look particularly formulaic and we can get lots of different variety as well so this result looks pretty much like most motion matching you've seen because that's basically what it is it's essentially doing motion matching but under a more general framework and the memory and runtime are both fairly reasonable so we have about 200 megabytes of data one millisecond so it's obviously not fast but for the amount of variety and the amount of different motions you see and for the simplicity of it it's pretty good so there's kind of one interesting thing if we frame this more generally as we have which is that there's not just kind of a few types of regression there's actually an insane amount so this is the contents page from a supervised learning library called scikit-learn and we can see nearest neighbor regression is down here so it's not some obscure edge case it's actually quite a popular machine learning algorithm and we can see there are some other ones so for example here's an algorithm called gaussian processes so these were pretty popular for a long time before neural networks and basically they do a smooth interpolation of the data so let's see if we could just replace nearest neighbor regression with gaussian processes and see what happens so here we we get our database we train a gaussian process and we plug this in as our function f so what happens is something like this so it basically doesn't work right it looks pretty bizarre and we're not really sure what's going on and gaussian processes they're kind of uh limited in scalability so they scale really poorly with the amount of data you train them on and i could only use about a thousand samples for training them so maybe we just didn't use enough data so let's look again at our contents page oh so there's neural networks great so the hot thing at the moment so we can try these and see what happens okay so one nice thing about these is we've got virtually unlimited data capacity and we can throw away the data once it's trained and they're fast to evaluate with low memory usage so as far as our goal of scalability is concerned they seem like the ideal thing to use right okay so uh how many of you know how a neural network works okay fair amount that's good so i'm going to give my like little five-minute rundown of neural networks so don't worry it's just going to be quick so a neural network like all machine learning algorithms is just a function so for example this is a simple function you kind of recognize from high school it takes some input and produces some output and as we've seen these are represented basically by vectors big lists of numbers and a single layer of a neural network is described by the this a function that looks like this so these variables w and b are the weights of the neural network so we have the input x and y the input and output x and y on either side and the first operation we do is we multiply this big vector x by this weight matrix w and we add another vector which is the bias b which is another another set of weights for the network and then we pass this through an activation function and the activation function is basically a simple function which produces some sort of bend or non-linearity in the in the out in the output so it's actually super super simple and super familiar it looks exactly like this really basic function we can think of from high school and when we stack we have a deep neural network we're basically just nesting this equation in upon itself so this equation is going to be exactly the equation we use for our f in this machine learning problem so it should be kind of immediately obvious how simple and how small and compact this using this sort of function is so it's really nothing complicated and the whole thing is encoded by these weights these w's and these b's so this is why we can throw away our database once it's trained and the way we train it is fairly simple as well so what we do is we put all of our x's in our database through this function and we see what numbers they produce and we compare them to the y's in our databases and then we basically use this to update the network weights so there's a bunch of different algorithms for doing this and we repeat it thousands and thousands of times and eventually we have the weights and biases which work for this particular function so i think it's kind of you can see why it's exciting from a scalability standpoint to use neural networks because they're very very small very very compact and we can understand uh the size and the computational complexity of how they're working and it's completely independent of the size of our training data so we can do this we train a neural network and then we throw away our data and it looks something like this so it's still basically not really working it's doing kind of something but it's not doing what we want right so why is this the reason is that like machine learning it's not it's not this magic black box you can't just train this neural network with no regard for anything and the results you're going to get are going to depend so much on how you represent your input how you represent your output and how and when you use this function f okay so it's not as simple as it's not as nice as just looking at this big content space for supervised learning and just picking an algorithm and trying it out and additionally there's something really uh bad about the problem we're trying to solve which is this function f isn't well defined so for example uh if we have the character standing still and we say go forward so we just give him a target in the future to go forward there's two different choices so he could either lead with his left foot or he could lead with his right foot and both of these choices will get him to the goal but the neural network or whatever machine learning algorithm you're using it doesn't know which one to choose so what it tends to do is it just produces an average of them both and this is what you see when you see the character kind of floating along the ground it's an average of using your left foot and using your right foot so the question is can we resolve this ambiguity so irregardless of tweaking our input and our output representation it looks like we need to solve this problem first if we're going to have any chance of getting it working so the answer is obviously yes we can and one way we can do it is by specifying the timing directly so we can use this concept of the phase to tell us exactly the timing of our left foot and our right foot and how we're cycling through our animation so for example the phase we can say it's a variable where when your left foot first goes down it's zero when your right foot goes down it's pi and when your left goes down again it's two pi so it's this kind of cyclic variable which is cycling between zero and two pi and then one idea is we could use a separate f depending on what the current phase of the character is so assuming the character is always has some value for his phase representing where he is in this cycle we can try and use a different f so let's try and do this in the kind of most simple way possible what we can do is we can separate all our x's and y's in our database into different bins depending on the phase so all of our data where the character's got his right foot down it's going to go in a different bin to all the data where a character has his left foot down for example and then at runtime given the phase we basically select which bin we want to use and we use the the function we've trained for this particular bin to generate our y from our x so let's see how this works so let's say we've binned our data like this so we have six different functions along this face space which we bind our data into and we get a new phase at runtime so now we see which bin this is we see which data is there which function we've trained f and we use this to produce our y from rx okay so that's how we're going to do it so let's try this again with a bunch of different machine learning algorithms and see what happens so we're going to set it up a little bit differently our input will be exactly the same the previous pose of the character and where we want to go in one second but our output now is just going to be one frame so we're not going to output a block of animation we're just going to have but one frame at a time and what we're going to do is we're going to select our function f depending on our phase call it to get our next pose of the character and then update our phase value so we also measure how much the phase changes at each frame so for each uh each frame in the in the database we have the change in the face and we update this so let's try it with nearest neighbor regression and see what happens so here we kind of get something that looks pretty similar to our original nearest neighbor regression where we were outputting blocks of animation that was like motion matching so that's kind of a good sign that things are roughly working as intended and one thing you'll notice is that the cycle of the locomotion is kind of much more strongly preserved in this setup so uh in the other one maybe there was more kind of diverse movement where the phase could change fast or slow and different stepping patterns here we really kind of see the cycle going on and on and we have the same kind of jumping issue where when it jumps to a new clip it kind of clicks we could also add blending like before and we get a nice result so let's see how the gaussian process fares in this case so it basically doesn't work again so i don't know what this means so probably it means that having lots of data is very important right and we can sort of see that the phase is still cycling it just looks a bit bizarre so we kind of have some feeling that our phase thing is working we're not just getting this gliding motion but the quality of the output is not great and there are some other kind of questions we can ask about this approach like what if the phase is in between two different bins uh and what about when it suddenly switches to a new bin then maybe we'll get some sort of discontinuity or some sort of jump and it seems like a waste to train multiple functions f so in each of these bins we have a different f but lots of them are going to be extremely similar because they're only just kind of a little bit in time different from the previous one and how can we use a neural network to attempt this same problem all right so this is basically uh where we're at for some previous research i did called phase function neural networks and this is exactly what we did now so the basic idea is that we have a neural network where the weights are generated from the phase directly so the weights are generated as a function of the phase and they change continuously and smoothly along with the phase so as a diagram it looks something like this what we have as input first is our phase variable and the phase variable it loops around in this circle and we have kind of four sets of different neural network weights and we have our phase function here so what this does is it's basically interpolating these four sets of different neural network weights depending on what the phase value is so for each different phase value we get a slightly different mix of these four different neural network weights and these interpolated weights they get given as the main weights to our normal neural network which maps from x's to y's so here we have our neural network which our equation which i showed before which takes an x and produces a y so if we do this in exactly our same setup as before we get something that looks like this so it's still not perfect it's not completely extremely responsive but the quality of the generated motion is actually quite nice and it's smooth and it varies continuously so if we continue tweaking these x and these y's and we continue tweaking this f and we try and incorporate all our best practices for how we train a neural network then we get something that looks more like this so what we have in the end is we have a character which can follow a trajectory nicely and produce kind of nice natural movement and to show how scalable this is we also trained it on a bunch of data where the character is going over rough terrain so here we had about a gigabyte of data where the character was going over different rough terrain so it was kind of roughly i think an hour and a half of data we captured walking over different rough terrain so we can see that the scalability is really there we can train on absolutely huge amounts of data and it works and it adapts to all these different varieties varied situations and we can also give these kind of tags which i described in the beginning so here we tag whether the character should be crouching or not crouching and we can give a continuous tag here based on the height of the ceiling and the character will kind of naturally and somewhat smoothly adapt or we can give kind of somewhat more discreet tags so here we give a tag saying that the character should be jumping at this point in the future and that's what the character does so it sees that in the database the only place where this tag was present was in jumping motions so this is what it plays and because we're also giving his input the height of the terrain under the character he can somewhat adapt his style of jump based on what's below him and the really nice thing is that once we have this neural network working the performance is pretty good for the amount of data it was trained on so as in the kind of most compact way what we can get is just 10 megabytes of neural network weights with a runtime of about one millisecond so 10 megabytes is pretty incredible if you think that it was trained on literally gigabytes of motion data and we've thrown all of this away and we just keep this 10 megabytes all in this particular phase function neural network approach we can do some sort of pre-computation which uses more memory but can evaluate a bit faster so it's definitely delivering what it promises in regards to scalability when it comes to data so that's how you can kind of generalize this solution frame it in a more classic machine learning way and try out a bunch of different experiments to see what works so in conclusion uh there are a couple of sacrifices we have to make for this approach so you have to give up precise control so if we think about those 15 000 animations that were made for assassin's creed origins at some point we're just not going to be able to hand author all these animations so we have to think about ways which can scale and lots of these ways we can scale are going to require giving up precise control in many contexts it also requires learning a whole new skill set so doing machine learning programming is a completely different beast to normal programming and it's very difficult and requires lots of different skills which you may not have learned in school or may not have learned over your career and finally it doesn't deal with a large number of special cases so it's not like we can just throw out everything we've got already and just use this thing to replace everything there's going to have to be some period of overlap between old systems and new systems but for scalability animation quality is kind of losing the battle against complexity so every time someone wants to push with the current approach of the state machine to add a new way of doing things it's kind of suffering against the complexity of the system and we can use machine learning to try and balance this fight or at least some ideas from machine learning and these ideas might be one of the best ways we can try and make progress in this direction so some things we're looking at in the future so how can we remove this phase variable deal with non-cyclic motions how can we scale to these hundreds of different styles so we talked about all these different tags what if we had tags what if we had a huge database with hundreds of different styles and all of these were tagged in detail would this work and how would it work and how can we continue to improve the quality so in some sense you can't get better quality than just playing back the motion capture data directly uh but how can we throw out the motion capture data and still retain a really high quality solution so i have a couple of bloopers so you can see what it's really like to do machine learning every day so you have some nice artifacts like this or one like this and this is like a nice moonwalk he was doing okay thank you very much so if you have any questions you can come up to the mics hi hello uh great talk great work thanks for for sharing two short ones i didn't get how do you generate the clips for the training data how are you controlling the character and the second one how do you get any research paper that you can mention we can get the details thank you uh the clips for the training data are taken from this big long database of long captures we did so if we're talking about having one second clips we basically have overlapping one second clips all the way through the database unless it was kind of tagged as junk and uh for the paper uh if you google face functioned neural networks uh it will pop up hey great talk um in your diagram of the phase neural network it looked like you had a separate network that was feeding weights into the existing neural network is there a reason why you couldn't just use the phase as an input variable input feature into the set did that yield errors or something yeah so you can use the phase as an additional input variable and it works somewhat but the quality is not as high so the kind of details for why that is is a bit complicated to explain but roughly you can think about it as when you give the phase in at the side it's really much more similar to binning the data into really separate bins whereas where you give it in at the top it's more like giving another variable saying how to blend between the whole database at once so giving it in the side is really like taking a slice of this database with just the phase values you want i don't know great time um i wanted to ask about the tagging process and uh is it are you looking towards automating that kind of thing or a very manual process and kind of who would do that kind of work because you know as the data scales obviously that's a lot of word to toggle that did yeah so i guess one of the great things about doing this tagging is that you can also use it to train class classifiers animation classifiers so you'd hope that you could get to a point where you have a large enough database where from then on you can do the tagging automatically with a classifier thank you uh hello um i was wondering how much iteration was done for representing the data going into the neural network uh because i could see the rotation as part of the pose being very hard to [Music] be consumed by a neural network and for a neural network to make sense of uh rotational data yeah so there's a lot of iteration that's kind of the black magic of uh machine learning and that's why you need to have a good intuition and do lots of different experiments so you can see exactly how we got the results we did in our paper if you look it up but yeah that's one thing you need to play around a lot with to get good results all right thank you um excellent talk very interesting so a lot of the examples that i've i've seen for both this talk and motion matching we're drive we're largely for like character locomotion navigation animation if you're say working in a space where like maybe you have a very sophisticated combat simulation in your game does have you found any success in this approach for you know driving additive animations or layered animations through something like this maybe something where your animation data doesn't necessarily have variables like speed or character intent it's maybe just something like i got shot and yeah something like that so you can use uh the motion matching stuff to drive events so in for honor the attacks are actually matched too so you'll have an input which says please play this style of attack in this amount of time and it will try and match the best one for the current pose and other factors so probably you should go to the ufc talk tomorrow i'm sure they're going to be talking a lot more about that sort of stuff and i'm sure it'll be super super insightful thanks for the tip hi great talk is it possible to incrementally train these sorts of neural networks so if i add a couple of new clips of a new kind of turn or something i don't have to necessarily retrain the whole thing for tens of hours of computation time uh so in theory yeah there is some research showing how you can incrementally train neural networks so i don't have any personal experience doing this but the research is there at least it just needs to be tried out i think hello uh has this approach been applied to quadrupeds uh not to my knowledge uh i guess we'll see one nice thing about this approach for quadrupeds is that um well it's very hard to get a quadruped to act but you can get a whole bunch of raw animation data from a quadruped by just putting it in a motion capture studio thank you hi great talk uh question did you have to overlay a lot of ik like to lock the hands and feet and did you try to augment the rotation data with the absolute position of the giants uh so there is a little so in the examples i showed there was no ik or anything uh all the joints were represented in the character space rather than the local space uh in the previous paper with the guy walking over the terrain there was a little bit of ik not that much it's not that strongly required it again depends on your your representation and these sorts of things thank you uh how many uh supervisor training data do you use sorry how many uh train training data do you use uh so in the stuff i was showing here it was about 200 megabytes which i think it's probably about kind of half an hour to an hour of data so you saw in the clip there was kind of three of us in the motion capture studio doing a whole bunch of random locomotion moves and i think we did about you know half an hour of that something like that thank you uh great work um i was curious for fine you know you mentioned that you lose this fine detail do you switch to another system when you have to like animate like cut scenes sort of thing where somebody has to pick up a bottle or you know pick something specific up can you use this for that or is it not yeah you can you can just work something you can just switch to the different system there's no reason why not but you could also potentially do it in the same system like i was saying you could actually tag the cutscene with one specific tag and say this is a super high priority tag so please make sure make sure you start this cut scene when it goes but when you want that much control it's probably easier to blend out too cool uh thanks great talk um thanks for uh making it very approachable um two questions how do you i don't like this mic how do you approach fine tuning where does that fall now between the animation lead and the programmer on the neural network side of things with this approach and then have you also taken the second question is have you taken this and applied it in a layered fashion like run and gun where you're kind of breaking apart into different sub components um so as far as fine tuning is concerned it's still quite early and we need probably dedicated tools to make the tuning a viable option so if your data is relatively small and you're doing something like motion matching you can actually edit the data and that works pretty well uh if you're training a neural network it's much harder to guarantee what the results are going to be like and probably will need uh dedicated tools and your second question was layering so layering yeah and things like that the kind of philosophy of machine learning is not to separate out things into layers and rather you hope that it learns in some sense what is a layer and what isn't a layer for example the crouching you saw although we never captured any crouching on rough terrain if you crouch and you go over the rough terrain in our little demo it does actually somewhat adapt the feed positions and these sorts of things so it's somehow learned that crouching is kind of independent of what the other actions you do on rough terrain are so the hope is that if you provide it with enough data it will learn to do all these sorts of layering operations for you and you can keep everything in one system that's the idea cool thank you hello thank you and very exciting talk so i want to ask you about the network neural network training so how do you decide how many layers you use how many nodes in each node and then the activation function you use and then either you normalize the input or not so do you find everything now optimized or do you think there are still work to do to find the optimized solution uh yeah so there are some kind of rules of thumb and intuitions you pick up if you do machine learning and you can also read the previous literature to see what research is doing similar applications have done to get a kind of rough idea of how you might want to structure your neural network yeah there's always going to be improvements and actually that's one of the best things about using neural networks is that you can import all the future improvements people come up with so maybe tomorrow someone will come up with a new activation function which improves everything we can implement that pretty much then and there tomorrow morning try it and see what happens and if it solves all our problems that's amazing so one huge benefit of generalizing and making it more like a general machine learning framework is to borrow from these other people who are developing these things thank you um nicole lazaro from zeo and uh followed white rabbit i was curious really really great talk and definitely very approachable i was curious about taking this approach to other you know aspects of animation and like if you had like an errol flynn you know kind of style of motion like maybe in the attacks or something like that would that approach locomotion is how the character moves is important and you want to have different kinds of motion for different types of characters but if you have your star character with you know different special kind of moves will that will this approach work or what changes yeah so it's a bit difficult if you really want handcrafted animation because you need such vast quantities of data to get good results so probably my advice would be to start with a database of locomotion data which is motion captured and large and to try and come up with procedural tweaks which automatically stylize this motion in the way you want or something like this so that you can easily produce large amounts of stylized data and if you can do that then yeah it works in exactly the same way it can work for stylized motion too thank you okay i think that's everyone thank you very much you
Info
Channel: GDC
Views: 24,519
Rating: 4.9301076 out of 5
Keywords: gdc, talk, panel, game, games, gaming, development, hd, design, ubisoft, assassin's creed origin
Id: o-QLSjSSyVk
Channel Id: undefined
Length: 54min 27sec (3267 seconds)
Published: Fri Oct 30 2020
Related Videos
Note
Please note that this website is currently a work in progress! Lots of interesting data and statistics to come.