Dear Fellow Scholars, this is Two Minute Papers
with Károly Zsolnai-Fehér. If we have an animation movie or a computer
game with quadrupeds, and we are yearning for really high-quality, lifelike animations,
motion capture is often the go-to tool for the job. Motion capture means that we put an actor,
in our case, a dog in the studio, we ask it to perform sitting, trotting, pacing and jumping,
record its motion, and transfer it onto our virtual character. In an earlier work, a learning-based technique
was introduced by the name Mode-Adaptive Neural Network, and it was able to correctly weave
together these previously recorded motions, and not only that, but it also addressed these
unnatural sliding motions that were produced by previous works. As you see here, it also worked well on more
challenging landscapes. We talked about this paper approximately a
hundred videos, or in other words, a little more than a year ago, and I noted that it
was scientifically interesting, it was evaluated well, it had all the ingredients for a truly
excellent paper. But one thing was missing. So what is that one thing? Well, we haven’t seen the characters interacting
with the scene itself. If you liked this previous paper, you are
going to be elated by this one because this new work is from the very same group, and
goes by the name Neural State Machine, and introduces character-scene interactions for
bipeds. Now, we suddenly jumped from a quadruped paper
to a biped one, and the reason for this is because I was looking to introduce the concept
of foot sliding, which will be measured later for this new method too. Stay tuned! So, in this new problem formulation, we need
to guide the character to a challenging end state, for instance, sitting in a chair, while
being able to maneuver through all kinds of geometry. We’ll use the chair example a fair bit in
the next minute or two, so I’ll stress that this can do a whole lot more, the chair is
just used as a vehicle to get a taste of how this technique works. But the end state needn’t just be some kind
of chair. It can be any chair! This chair may have all kinds of different
heights and shapes, and the agent has to be able to change the animations and stitch them
together correctly regardless of the geometry. To achieve this, the authors propose an interesting
new data augmentation model. Since we are working with neural networks,
we already have a training set to teach it about motion, and data augmentation means
that we extend this dataset with lots and lots of new information to make the AI generalize
better to unseen, real-world examples. So, how is this done here exactly? Well, the authors proposed a clever idea to
do this. Let’s walk through their five prescribed
steps. One, let’s use motion capture data, have
the subject sit down and see what the contact points are when it happens. Two, we then record the curves that describe
the entirety of the motion of sitting down. So far so good, but we are not interested
in one kind of chair, we want it to sit into all kinds of chairs, so three, generate a
large selection of different geometries and adjust the location of these contact points
accordingly. Four, change the motion curves so they indeed
end at the new, transformed contact points. And five, move the joints of the character
to make it follow this motion curve and compute the evolution of the character pose. We then pair up this motion with the chair
geometry and chuck it into the new, augmented training set. Now, make no mistake, the paper contains much,
much more than this, so make sure to have a look in the video description. So what do we get for all this work? Well, have a look at this trembly character
from a previous paper, and look at the new synthesized motions. Natural, smooth, creamy, and I don’t see
artifacts. Also, here you see some results that measure
the amount of foot sliding during these animations, which is subject to minimization. That means that the smaller the bars are,
the better. With NSM, you see how this Neural State Machine
method produces much less than previous methods, and now we see how cool it is that we talked
about the quadruped paper as well, because we see that it even beats the MANN, the mode-adaptive
neural networks from the previous paper. That one had very little foot sliding, and
apparently, it can still be improved by quite a bit. The positional and rotational errors in the
animation it offers are also by far the lowest of the bunch. Since it works in real time, it can also be
used for computer games and virtual reality applications. And all this improvement within a year of
work. What a time to be alive! If you're a researcher or a startup looking
for cheap GPU compute to run these algorithms, check out Lambda GPU Cloud. I've talked about Lambda's GPU workstations
in other videos and am happy to tell you that they're offering GPU cloud services as well. The Lambda GPU Cloud can train Imagenet to
93% accuracy for less than $19! Lambda's web-based IDE lets you easily access your instance right
in your browser. And finally, hold on to your papers, because
the Lambda GPU Cloud costs less than half of AWS and Azure. Make sure to go to lambdalabs.com/papers and
sign up for one of their amazing GPU instances today. Thanks for watching and for your generous
support, and I'll see you next time!
Amazing, if only this were a solution viable for indie devs right now! I am starting to love Unity's mech anim but this is the next level. :)
In theory would this save on resource usage when running a game? Like if you only need the initial resources to run the AI and it outputs technically unlimited animations on the spot would that save up on resource usage that having every single animation individually stored and programmed would use?